Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Sergio Oramas

Contrastive Learning for Cross-modal Artist Retrieval

Aug 12, 2023

Andres Ferraro, Jaehun Kim, Sergio Oramas, Andreas Ehmann, Fabien Gouyon

Figure 1 for Contrastive Learning for Cross-modal Artist Retrieval

Figure 2 for Contrastive Learning for Cross-modal Artist Retrieval

Figure 3 for Contrastive Learning for Cross-modal Artist Retrieval

Figure 4 for Contrastive Learning for Cross-modal Artist Retrieval

Abstract:Music retrieval and recommendation applications often rely on content features encoded as embeddings, which provide vector representations of items in a music dataset. Numerous complementary embeddings can be derived from processing items originally represented in several modalities, e.g., audio signals, user interaction data, or editorial data. However, data of any given modality might not be available for all items in any music dataset. In this work, we propose a method based on contrastive learning to combine embeddings from multiple modalities and explore the impact of the presence or absence of embeddings from diverse modalities in an artist similarity task. Experiments on two datasets suggest that our contrastive method outperforms single-modality embeddings and baseline algorithms for combining modalities, both in terms of artist retrieval accuracy and coverage. Improvements with respect to other methods are particularly significant for less popular query artists. We demonstrate our method successfully combines complementary information from diverse modalities, and is more robust to missing modality data (i.e., it better handles the retrieval of artists with different modality embeddings than the query artist's).

Via

Access Paper or Ask Questions

Supervised and Unsupervised Learning of Audio Representations for Music Understanding

Oct 07, 2022

Matthew C. McCallum, Filip Korzeniowski, Sergio Oramas, Fabien Gouyon, Andreas F. Ehmann

Figure 1 for Supervised and Unsupervised Learning of Audio Representations for Music Understanding

Figure 2 for Supervised and Unsupervised Learning of Audio Representations for Music Understanding

Figure 3 for Supervised and Unsupervised Learning of Audio Representations for Music Understanding

Figure 4 for Supervised and Unsupervised Learning of Audio Representations for Music Understanding

Abstract:In this work, we provide a broad comparative analysis of strategies for pre-training audio understanding models for several tasks in the music domain, including labelling of genre, era, origin, mood, instrumentation, key, pitch, vocal characteristics, tempo and sonority. Specifically, we explore how the domain of pre-training datasets (music or generic audio) and the pre-training methodology (supervised or unsupervised) affects the adequacy of the resulting audio embeddings for downstream tasks. We show that models trained via supervised learning on large-scale expert-annotated music datasets achieve state-of-the-art performance in a wide range of music labelling tasks, each with novel content and vocabularies. This can be done in an efficient manner with models containing less than 100 million parameters that require no fine-tuning or reparameterization for downstream tasks, making this approach practical for industry-scale audio catalogs. Within the class of unsupervised learning strategies, we show that the domain of the training dataset can significantly impact the performance of representations learned by the model. We find that restricting the domain of the pre-training dataset to music allows for training with smaller batch sizes while achieving state-of-the-art in unsupervised learning -- and in some cases, supervised learning -- for music understanding. We also corroborate that, while achieving state-of-the-art performance on many tasks, supervised learning can cause models to specialize to the supervised information provided, somewhat compromising a model's generality.

Via

Access Paper or Ask Questions

Artist Similarity with Graph Neural Networks

Jul 30, 2021

Filip Korzeniowski, Sergio Oramas, Fabien Gouyon

Figure 1 for Artist Similarity with Graph Neural Networks

Figure 2 for Artist Similarity with Graph Neural Networks

Figure 3 for Artist Similarity with Graph Neural Networks

Figure 4 for Artist Similarity with Graph Neural Networks

Abstract:Artist similarity plays an important role in organizing, understanding, and subsequently, facilitating discovery in large collections of music. In this paper, we present a hybrid approach to computing similarity between artists using graph neural networks trained with triplet loss. The novelty of using a graph neural network architecture is to combine the topology of a graph of artist connections with content features to embed artists into a vector space that encodes similarity. To evaluate the proposed method, we compile the new OLGA dataset, which contains artist similarities from AllMusic, together with content features from AcousticBrainz. With 17,673 artists, this is the largest academic artist similarity dataset that includes content-based features to date. Moreover, we also showcase the scalability of our approach by experimenting with a much larger proprietary dataset. Results show the superiority of the proposed approach over current state-of-the-art methods for music similarity. Finally, we hope that the OLGA dataset will facilitate research on data-driven models for artist similarity.

* Appears in Proc. of the International Society for Music Information Retrieval Conference 2021 (ISMIR 2021)

Via

Access Paper or Ask Questions

Mood Classification Using Listening Data

Oct 22, 2020

Filip Korzeniowski, Oriol Nieto, Matthew McCallum, Minz Won, Sergio Oramas, Erik Schmidt

Figure 1 for Mood Classification Using Listening Data

Figure 2 for Mood Classification Using Listening Data

Figure 3 for Mood Classification Using Listening Data

Figure 4 for Mood Classification Using Listening Data

Abstract:The mood of a song is a highly relevant feature for exploration and recommendation in large collections of music. These collections tend to require automatic methods for predicting such moods. In this work, we show that listening-based features outperform content-based ones when classifying moods: embeddings obtained through matrix factorization of listening data appear to be more informative of a track mood than embeddings based on its audio content. To demonstrate this, we compile a subset of the Million Song Dataset, totalling 67k tracks, with expert annotations of 188 different moods collected from AllMusic. Our results on this novel dataset not only expose the limitations of current audio-based models, but also aim to foster further reproducible research on this timely topic.

* Appears in Proc. of the International Society for Music Information Retrieval Conference 2020 (ISMIR 2020)

Via

Access Paper or Ask Questions

Natural Language Processing for Music Knowledge Discovery

Jul 06, 2018

Sergio Oramas, Luis Espinosa-Anke, Francisco Gómez, Xavier Serra

Figure 1 for Natural Language Processing for Music Knowledge Discovery

Figure 2 for Natural Language Processing for Music Knowledge Discovery

Figure 3 for Natural Language Processing for Music Knowledge Discovery

Figure 4 for Natural Language Processing for Music Knowledge Discovery

Abstract:Today, a massive amount of musical knowledge is stored in written form, with testimonies dated as far back as several centuries ago. In this work, we present different Natural Language Processing (NLP) approaches to harness the potential of these text collections for automatic music knowledge discovery, covering different phases in a prototypical NLP pipeline, namely corpus compilation, text-mining, information extraction, knowledge graph generation and sentiment analysis. Each of these approaches is presented alongside different use cases (i.e., flamenco, Renaissance and popular music) where large collections of documents are processed, and conclusions stemming from data-driven analyses are presented and discussed.

* Journal of New Music Research (2018)

Via

Access Paper or Ask Questions

A Deep Multimodal Approach for Cold-start Music Recommendation

Jul 24, 2017

Sergio Oramas, Oriol Nieto, Mohamed Sordo, Xavier Serra

Figure 1 for A Deep Multimodal Approach for Cold-start Music Recommendation

Figure 2 for A Deep Multimodal Approach for Cold-start Music Recommendation

Figure 3 for A Deep Multimodal Approach for Cold-start Music Recommendation

Abstract:An increasing amount of digital music is being published daily. Music streaming services often ingest all available music, but this poses a challenge: how to recommend new artists for which prior knowledge is scarce? In this work we aim to address this so-called cold-start problem by combining text and audio information with user feedback data using deep network architectures. Our method is divided into three steps. First, artist embeddings are learned from biographies by combining semantics, text features, and aggregated usage data. Second, track embeddings are learned from the audio signal and available feedback data. Finally, artist and track embeddings are combined in a multimodal network. Results suggest that both splitting the recommendation problem between feature levels (i.e., artist metadata and audio track), and merging feature embeddings in a multimodal approach improve the accuracy of the recommendations.

* In Proceedings of the 2nd Workshop on Deep Learning for Recommender Systems (DLRS 2017), collocated with RecSys 2017

Via

Access Paper or Ask Questions