Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Axel Marmoret

Univ. Rennes 1, Inria, CNRS, IRISA, France

Barwise Music Structure Analysis with the Correlation Block-Matching Segmentation Algorithm

Nov 30, 2023

Axel Marmoret, Jérémy E. Cohen, Frédéric Bimbot

Abstract:Music Structure Analysis (MSA) is a Music Information Retrieval task consisting of representing a song in a simplified, organized manner by breaking it down into sections typically corresponding to ``chorus'', ``verse'', ``solo'', etc. In this work, we extend an MSA algorithm called the Correlation Block-Matching (CBM) algorithm introduced by (Marmoret et al., 2020, 2022b). The CBM algorithm is a dynamic programming algorithm that segments self-similarity matrices, which are a standard description used in MSA and in numerous other applications. In this work, self-similarity matrices are computed from the feature representation of an audio signal and time is sampled at the bar-scale. This study examines three different standard similarity functions for the computation of self-similarity matrices. Results show that, in optimal conditions, the proposed algorithm achieves a level of performance which is competitive with supervised state-of-the-art methods while only requiring knowledge of bar positions. In addition, the algorithm is made open-source and is highly customizable.

* Transactions of the International Society for Music Information Retrieval, 6(1), 2023, 167--185
* 19 pages, 13 figures, 11 tables, 1 algorithm, published in Transactions of the International Society for Music Information Retrieval

Via

Access Paper or Ask Questions

Polytopic Analysis of Music

Dec 22, 2022

Axel Marmoret, Jérémy E. Cohen, Frédéric Bimbot

Figure 1 for Polytopic Analysis of Music

Figure 2 for Polytopic Analysis of Music

Figure 3 for Polytopic Analysis of Music

Figure 4 for Polytopic Analysis of Music

Abstract:Structural segmentation of music refers to the task of finding a symbolic representation of the organisation of a song, reducing the musical flow to a partition of non-overlapping segments. Under this definition, the musical structure may not be unique, and may even be ambiguous. One way to resolve that ambiguity is to see this task as a compression process, and to consider the musical structure as the optimization of a given compression criteria. In that viewpoint, C. Guichaoua developed a compression-driven model for retrieving the musical structure, based on the "System and Contrast" model, and on polytopes, which are extension of nhypercubes. We present this model, which we call "polytopic analysis of music", along with a new opensource dedicated toolbox called MusicOnPolytopes (in Python). This model is also extended to the use of the Tonnetz as a relation system. Structural segmentation experiments are conducted on the RWC Pop dataset. Results show improvements compared to the previous ones, presented by C. Guichaoua.

* Work document

Via

Access Paper or Ask Questions

Convolutive Block-Matching Segmentation Algorithm with Application to Music Structure Analysis

Oct 27, 2022

Axel Marmoret, Jérémy E. Cohen, Frédéric Bimbot

Abstract:Music Structure Analysis (MSA) consists of representing a song in sections (such as ``chorus'', ``verse'', ``solo'' etc), and can be seen as the retrieval of a simplified organization of the song. This work presents a new algorithm, called Convolutive Block-Matching (CBM) algorithm, devoted to MSA. In particular, the CBM algorithm is a dynamic programming algorithm, applying on autosimilarity matrices, a standard tool in MSA. In this work, autosimilarity matrices are computed from the feature representation of an audio signal, and time is sampled on the barscale. We study three different similarity functions for the computation of autosimilarity matrices. We report that the proposed algorithm achieves a level of performance competitive to that of supervised state-of-the-art methods on 3 among 4 metrics, while being fully unsupervised.

* 4 pages, 5 figures, 1 table. Submitted at ICASSP 2023. The associated toolbox is available at https://gitlab.inria.fr/amarmore/autosimilarity_segmentation

Via

Access Paper or Ask Questions

Semi-Supervised Convolutive NMF for Automatic Music Transcription

Feb 10, 2022

Haoran Wu, Axel Marmoret, Jérémy E. Cohen

Figure 1 for Semi-Supervised Convolutive NMF for Automatic Music Transcription

Figure 2 for Semi-Supervised Convolutive NMF for Automatic Music Transcription

Figure 3 for Semi-Supervised Convolutive NMF for Automatic Music Transcription

Figure 4 for Semi-Supervised Convolutive NMF for Automatic Music Transcription

Abstract:Automatic Music Transcription, which consists in transforming an audio recording of a musical performance into symbolic format, remains a difficult Music Information Retrieval task. In this work, we propose a semi-supervised approach using low-rank matrix factorization techniques, in particular Convolutive Nonnegative Matrix Factorization. In the semi-supervised setting, only a single recording of each individual notes is required. We show on the MAPS dataset that the proposed semi-supervised CNMF method performs better than state-of-the-art low-rank factorization techniques and a little worse than supervised deep learning state-of-the-art methods, while however suffering from generalization issues.

* Submitted to 2022 Sound and Music Computing (SMC) conference, 7 pages, 5 figures, 3 tables, coda available at https://github.com/cohenjer/TransSSCNMF

Via

Access Paper or Ask Questions

Barwise Compression Schemes for Audio-Based Music Structure Analysis

Feb 10, 2022

Axel Marmoret, Jérémy E. Cohen, Frédéric Bimbot

Figure 1 for Barwise Compression Schemes for Audio-Based Music Structure Analysis

Figure 2 for Barwise Compression Schemes for Audio-Based Music Structure Analysis

Figure 3 for Barwise Compression Schemes for Audio-Based Music Structure Analysis

Figure 4 for Barwise Compression Schemes for Audio-Based Music Structure Analysis

Abstract:Music Structure Analysis (MSA) consists in segmenting a music piece in several distinct sections. We approach MSA within a compression framework, under the hypothesis that the structure is more easily revealed by a simplified representation of the original content of the song. More specifically, under the hypothesis that MSA is correlated with similarities occurring at the bar scale, linear and non-linear compression schemes can be applied to barwise audio signals. Compressed representations capture the most salient components of the different bars in the song and are then used to infer the song structure using a dynamic programming algorithm. This work explores both low-rank approximation models such as Principal Component Analysis or Nonnegative Matrix Factorization and "piece-specific" Auto-Encoding Neural Networks, with the objective to learn latent representations specific to a given song. Such approaches do not rely on supervision nor annotations, which are well-known to be tedious to collect and possibly ambiguous in MSA description. In our experiments, several unsupervised compression schemes achieve a level of performance comparable to that of state-of-the-art supervised methods (for 3s tolerance) on the RWC-Pop dataset, showcasing the importance of the barwise compression processing for MSA.

* Submitted at the 2022 Sound and Music Computing (SMC) conference, 8 pages, 6 figures, 1 table, code available at https://gitlab.inria.fr/amarmore/barwisemusiccompression. arXiv admin note: substantial text overlap with arXiv:2110.14437

Via

Access Paper or Ask Questions

Nonnegative Tucker Decomposition with Beta-divergence for Music Structure Analysis of audio signals

Nov 04, 2021

Axel Marmoret, Florian Voorwinden, Valentin Leplat, Jérémy E. Cohen, Frédéric Bimbot

Figure 1 for Nonnegative Tucker Decomposition with Beta-divergence for Music Structure Analysis of audio signals

Figure 2 for Nonnegative Tucker Decomposition with Beta-divergence for Music Structure Analysis of audio signals

Figure 3 for Nonnegative Tucker Decomposition with Beta-divergence for Music Structure Analysis of audio signals

Abstract:Nonnegative Tucker Decomposition (NTD), a tensor decomposition model, has received increased interest in the recent years because of its ability to blindly extract meaningful patterns in tensor data. Nevertheless, existing algorithms to compute NTD are mostly designed for the Euclidean loss. On the other hand, NTD has recently proven to be a powerful tool in Music Information Retrieval. This work proposes a Multiplicative Updates algorithm to compute NTD with the beta-divergence loss, often considered a better loss for audio processing. We notably show how to implement efficiently the multiplicative rules using tensor algebra, a naive approach being intractable. Finally, we show on a Music Structure Analysis task that unsupervised NTD fitted with beta-divergence loss outperforms earlier results obtained with the Euclidean loss.

* 4 pages, 2 figures, 1 table, 1 algorithm, submitted to ICASSP 2022

Via

Access Paper or Ask Questions

Exploring single-song autoencoding schemes for audio-based music structure analysis

Oct 27, 2021

Axel Marmoret, Jérémy E. Cohen, Frédéric Bimbot

Figure 1 for Exploring single-song autoencoding schemes for audio-based music structure analysis

Figure 2 for Exploring single-song autoencoding schemes for audio-based music structure analysis

Figure 3 for Exploring single-song autoencoding schemes for audio-based music structure analysis

Figure 4 for Exploring single-song autoencoding schemes for audio-based music structure analysis

Abstract:The ability of deep neural networks to learn complex data relations and representations is established nowadays, but it generally relies on large sets of training data. This work explores a "piece-specific" autoencoding scheme, in which a low-dimensional autoencoder is trained to learn a latent/compressed representation specific to a given song, which can then be used to infer the song structure. Such a model does not rely on supervision nor annotations, which are well-known to be tedious to collect and often ambiguous in Music Structure Analysis. We report that the proposed unsupervised auto-encoding scheme achieves the level of performance of supervised state-of-the-art methods with 3 seconds tolerance when using a Log Mel spectrogram representation on the RWC-Pop dataset.

* 4 pages, 4 figures, 2 tables, submitted to ICASSP 2022

Via

Access Paper or Ask Questions

Multi-Channel Automatic Music Transcription Using Tensor Algebra

Jul 23, 2021

Axel Marmoret, Nancy Bertin, Jeremy Cohen

Figure 1 for Multi-Channel Automatic Music Transcription Using Tensor Algebra

Figure 2 for Multi-Channel Automatic Music Transcription Using Tensor Algebra

Figure 3 for Multi-Channel Automatic Music Transcription Using Tensor Algebra

Figure 4 for Multi-Channel Automatic Music Transcription Using Tensor Algebra

Abstract:Music is an art, perceived in unique ways by every listener, coming from acoustic signals. In the meantime, standards as musical scores exist to describe it. Even if humans can make this transcription, it is costly in terms of time and efforts, even more with the explosion of information consecutively to the rise of the Internet. In that sense, researches are driven in the direction of Automatic Music Transcription. While this task is considered solved in the case of single notes, it is still open when notes superpose themselves, forming chords. This report aims at developing some of the existing techniques towards Music Transcription, particularly matrix factorization, and introducing the concept of multi-channel automatic music transcription. This concept will be explored with mathematical objects called tensors.

* 40 pages, 14 figues, 5 tables, code can be found at: https://gitlab.inria.fr/amarmore/nonnegative-factorization

Via

Access Paper or Ask Questions

Uncovering audio patterns in music with Nonnegative Tucker Decomposition for structural segmentation

Apr 17, 2021

Axel Marmoret, Jérémy E. Cohen, Nancy Bertin, Frédéric Bimbot

Figure 1 for Uncovering audio patterns in music with Nonnegative Tucker Decomposition for structural segmentation

Figure 2 for Uncovering audio patterns in music with Nonnegative Tucker Decomposition for structural segmentation

Figure 3 for Uncovering audio patterns in music with Nonnegative Tucker Decomposition for structural segmentation

Figure 4 for Uncovering audio patterns in music with Nonnegative Tucker Decomposition for structural segmentation

Abstract:Recent work has proposed the use of tensor decomposition to model repetitions and to separate tracks in loop-based electronic music. The present work investigates further on the ability of Nonnegative Tucker Decompositon (NTD) to uncover musical patterns and structure in pop songs in their audio form. Exploiting the fact that NTD tends to express the content of bars as linear combinations of a few patterns, we illustrate the ability of the decomposition to capture and single out repeated motifs in the corresponding compressed space, which can be interpreted from a musical viewpoint. The resulting features also turn out to be efficient for structural segmentation, leading to experimental results on the RWC Pop data set which are potentially challenging state-of-the-art approaches that rely on extensive example-based learning schemes.

* 21st International Society for Music Information Retrieval Conference (ISMIR), Montr\'eal, Canada, 2020, 788-794
* 7 pages, 6 figures; Code and experiments details available at https://gitlab.inria.fr/amarmore/musicntd/-/tree/0.1.0; Experiments details available at https://ax-le.github.io/resources/ISMIR2020/Notebooks_mainpage.html

Via

Access Paper or Ask Questions