Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Vassilis Lyberatos

Semantic-Aware Interpretable Multimodal Music Auto-Tagging

May 26, 2025

Andreas Patakis, Vassilis Lyberatos, Spyridon Kantarelis, Edmund Dervakos, Giorgos Stamou

Abstract:Music auto-tagging is essential for organizing and discovering music in extensive digital libraries. While foundation models achieve exceptional performance in this domain, their outputs often lack interpretability, limiting trust and usability for researchers and end-users alike. In this work, we present an interpretable framework for music auto-tagging that leverages groups of musically meaningful multimodal features, derived from signal processing, deep learning, ontology engineering, and natural language processing. To enhance interpretability, we cluster features semantically and employ an expectation maximization algorithm, assigning distinct weights to each group based on its contribution to the tagging process. Our method achieves competitive tagging performance while offering a deeper understanding of the decision-making process, paving the way for more transparent and user-centric music tagging systems.

* Accepted at Interspeech 2025

Via

Access Paper or Ask Questions

CHORDONOMICON: A Dataset of 666,000 Songs and their Chord Progressions

Oct 29, 2024

Spyridon Kantarelis, Konstantinos Thomas, Vassilis Lyberatos, Edmund Dervakos, Giorgos Stamou

Abstract:Chord progressions encapsulate important information about music, pertaining to its structure and conveyed emotions. They serve as the backbone of musical composition, and in many cases, they are the sole information required for a musician to play along and follow the music. Despite their importance, chord progressions as a data domain remain underexplored. There is a lack of large-scale datasets suitable for deep learning applications, and limited research exploring chord progressions as an input modality. In this work, we present Chordonomicon, a dataset of over 666,000 songs and their chord progressions, annotated with structural parts, genre, and release date - created by scraping various sources of user-generated progressions and associated metadata. We demonstrate the practical utility of the Chordonomicon dataset for classification and generation tasks, and discuss its potential to provide valuable insights to the research community. Chord progressions are unique in their ability to be represented in multiple formats (e.g. text, graph) and the wealth of information chords convey in given contexts, such as their harmonic function . These characteristics make the Chordonomicon an ideal testbed for exploring advanced machine learning techniques, including transformers, graph machine learning, and hybrid systems that combine knowledge representation and machine learning.

Via

Access Paper or Ask Questions

BERTtime Stories: Investigating the Role of Synthetic Story Data in Language pre-training

Oct 20, 2024

Nikitas Theodoropoulos, Giorgos Filandrianos, Vassilis Lyberatos, Maria Lymperaiou, Giorgos Stamou

Figure 1 for BERTtime Stories: Investigating the Role of Synthetic Story Data in Language pre-training

Figure 2 for BERTtime Stories: Investigating the Role of Synthetic Story Data in Language pre-training

Figure 3 for BERTtime Stories: Investigating the Role of Synthetic Story Data in Language pre-training

Figure 4 for BERTtime Stories: Investigating the Role of Synthetic Story Data in Language pre-training

Abstract:We describe our contribution to the Strict and Strict-Small tracks of the 2nd iteration of the BabyLM Challenge. The shared task is centered around efficient pre-training given data constraints motivated by human development. In response, we study the effect of synthetic story data in language pre-training using TinyStories: a recently introduced dataset of short stories. Initially, we train GPT-Neo models on subsets of TinyStories, while varying the amount of available data. We find that, even with access to less than 100M words, the models are able to generate high-quality, original completions to a given story, and acquire substantial linguistic knowledge. To measure the effect of synthetic story data, we train LTG-BERT encoder models on a combined dataset of: a subset of TinyStories, story completions generated by GPT-Neo, and a subset of the BabyLM dataset. Our experimentation reveals that synthetic data can occasionally offer modest gains, but overall have a negative influence on linguistic understanding. Our work offers an initial study on synthesizing story data in low resource settings and underscores their potential for augmentation in data-constrained language modeling. We publicly release our models and implementation on our GitHub.

Via

Access Paper or Ask Questions

MusicLIME: Explainable Multimodal Music Understanding

Sep 16, 2024

Theodoros Sotirou, Vassilis Lyberatos, Orfeas Menis Mastromichalakis, Giorgos Stamou

Figure 1 for MusicLIME: Explainable Multimodal Music Understanding

Figure 2 for MusicLIME: Explainable Multimodal Music Understanding

Figure 3 for MusicLIME: Explainable Multimodal Music Understanding

Figure 4 for MusicLIME: Explainable Multimodal Music Understanding

Abstract:Multimodal models are critical for music understanding tasks, as they capture the complex interplay between audio and lyrics. However, as these models become more prevalent, the need for explainability grows-understanding how these systems make decisions is vital for ensuring fairness, reducing bias, and fostering trust. In this paper, we introduce MusicLIME, a model-agnostic feature importance explanation method designed for multimodal music models. Unlike traditional unimodal methods, which analyze each modality separately without considering the interaction between them, often leading to incomplete or misleading explanations, MusicLIME reveals how audio and lyrical features interact and contribute to predictions, providing a holistic view of the model's decision-making. Additionally, we enhance local explanations by aggregating them into global explanations, giving users a broader perspective of model behavior. Through this work, we contribute to improving the interpretability of multimodal music models, empowering users to make informed choices, and fostering more equitable, fair, and transparent music understanding systems.

* GitHub repository: https://github.com/IamTheo2000/MusicLIME

Via

Access Paper or Ask Questions

Perceptual Musical Features for Interpretable Audio Tagging

Jan 04, 2024

Vassilis Lyberatos, Spyridon Kantarelis, Edmund Dervakos, Giorgos Stamou

Figure 1 for Perceptual Musical Features for Interpretable Audio Tagging

Figure 2 for Perceptual Musical Features for Interpretable Audio Tagging

Figure 3 for Perceptual Musical Features for Interpretable Audio Tagging

Figure 4 for Perceptual Musical Features for Interpretable Audio Tagging

Abstract:In the age of music streaming platforms, the task of automatically tagging music audio has garnered significant attention, driving researchers to devise methods aimed at enhancing performance metrics on standard datasets. Most recent approaches rely on deep neural networks, which, despite their impressive performance, possess opacity, making it challenging to elucidate their output for a given input. While the issue of interpretability has been emphasized in other fields like medicine, it has not received attention in music-related tasks. In this study, we explored the relevance of interpretability in the context of automatic music tagging. We constructed a workflow that incorporates three different information extraction techniques: a) leveraging symbolic knowledge, b) utilizing auxiliary deep neural networks, and c) employing signal processing to extract perceptual features from audio files. These features were subsequently used to train an interpretable machine-learning model for tag prediction. We conducted experiments on two datasets, namely the MTG-Jamendo dataset and the GTZAN dataset. Our method surpassed the performance of baseline models in both tasks and, in certain instances, demonstrated competitiveness with the current state-of-the-art. We conclude that there are use cases where the deterioration in performance is outweighed by the value of interpretability.

* Github Repository: https://github.com/vaslyb/perceptible-music-tagging

Via

Access Paper or Ask Questions

Employing Crowdsourcing for Enriching a Music Knowledge Base in Higher Education

Jun 12, 2023

Vassilis Lyberatos, Spyridon Kantarelis, Eirini Kaldeli, Spyros Bekiaris, Panagiotis Tzortzis, Orfeas Menis - Mastromichalakis, Giorgos Stamou

Figure 1 for Employing Crowdsourcing for Enriching a Music Knowledge Base in Higher Education

Figure 2 for Employing Crowdsourcing for Enriching a Music Knowledge Base in Higher Education

Figure 3 for Employing Crowdsourcing for Enriching a Music Knowledge Base in Higher Education

Figure 4 for Employing Crowdsourcing for Enriching a Music Knowledge Base in Higher Education

Abstract:This paper describes the methodology followed and the lessons learned from employing crowdsourcing techniques as part of a homework assignment involving higher education students of computer science. Making use of a platform that supports crowdsourcing in the cultural heritage domain students were solicited to enrich the metadata associated with a selection of music tracks. The results of the campaign were further analyzed and exploited by students through the use of semantic web technologies. In total, 98 students participated in the campaign, contributing more than 6400 annotations concerning 854 tracks. The process also led to the creation of an openly available annotated dataset, which can be useful for machine learning models for music tagging. The campaign's results and the comments gathered through an online survey enable us to draw some useful insights about the benefits and challenges of integrating crowdsourcing into computer science curricula and how this can enhance students' engagement in the learning process.

* To be published in The 4th International Conference on Artificial Intelligence in Education Technology (AIET 2023), Berlin, Germany, 31 June-2 July 2023. For The GitHub code for the created music dataset, see https://github.com/vaslyb/MusicCrow

Via

Access Paper or Ask Questions

Synergy of Machine and Deep Learning Models for Multi-Painter Recognition

Apr 28, 2023

Vassilis Lyberatos, Paraskevi-Antonia Theofilou, Jason Liartis, Georgios Siolas

Abstract:The growing availability of digitized art collections has created the need to manage, analyze and categorize large amounts of data related to abstract concepts, highlighting a demanding problem of computer science and leading to new research perspectives. Advances in artificial intelligence and neural networks provide the right tools for this challenge. The analysis of artworks to extract features useful in certain works is at the heart of the era. In the present work, we approach the problem of painter recognition in a set of digitized paintings, derived from the WikiArt repository, using transfer learning to extract the appropriate features and classical machine learning methods to evaluate the result. Through the testing of various models and their fine tuning we came to the conclusion that RegNet performs better in exporting features, while SVM makes the best classification of images based on the painter with a performance of up to 85%. Also, we introduced a new large dataset for painting recognition task including 62 artists achieving good results.

* Github Repository: https://github.com/jliartis/art-recognition

Via

Access Paper or Ask Questions