Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Yuntae Kim

Enriched Music Representations with Multiple Cross-modal Contrastive Learning

Apr 01, 2021

Andres Ferraro, Xavier Favory, Konstantinos Drossos, Yuntae Kim, Dmitry Bogdanov

Figure 1 for Enriched Music Representations with Multiple Cross-modal Contrastive Learning

Figure 2 for Enriched Music Representations with Multiple Cross-modal Contrastive Learning

Figure 3 for Enriched Music Representations with Multiple Cross-modal Contrastive Learning

Figure 4 for Enriched Music Representations with Multiple Cross-modal Contrastive Learning

Abstract:Modeling various aspects that make a music piece unique is a challenging task, requiring the combination of multiple sources of information. Deep learning is commonly used to obtain representations using various sources of information, such as the audio, interactions between users and songs, or associated genre metadata. Recently, contrastive learning has led to representations that generalize better compared to traditional supervised methods. In this paper, we present a novel approach that combines multiple types of information related to music using cross-modal contrastive learning, allowing us to learn an audio feature from heterogeneous data simultaneously. We align the latent representations obtained from playlists-track interactions, genre metadata, and the tracks' audio, by maximizing the agreement between these modality representations using a contrastive loss. We evaluate our approach in three tasks, namely, genre classification, playlist continuation and automatic tagging. We compare the performances with a baseline audio-based CNN trained to predict these modalities. We also study the importance of including multiple sources of information when training our embedding model. The results suggest that the proposed method outperforms the baseline in all the three downstream tasks and achieves comparable performance to the state-of-the-art.

* Accepted for publication to IEEE Signal Processing Letters

Via

Access Paper or Ask Questions

Melon Playlist Dataset: a public dataset for audio-based playlist generation and music tagging

Jan 30, 2021

Andres Ferraro, Yuntae Kim, Soohyeon Lee, Biho Kim, Namjun Jo, Semi Lim, Suyon Lim, Jungtaek Jang, Sehwan Kim, Xavier Serra(+1 more)

Figure 1 for Melon Playlist Dataset: a public dataset for audio-based playlist generation and music tagging

Figure 2 for Melon Playlist Dataset: a public dataset for audio-based playlist generation and music tagging

Figure 3 for Melon Playlist Dataset: a public dataset for audio-based playlist generation and music tagging

Figure 4 for Melon Playlist Dataset: a public dataset for audio-based playlist generation and music tagging

Abstract:One of the main limitations in the field of audio signal processing is the lack of large public datasets with audio representations and high-quality annotations due to restrictions of copyrighted commercial music. We present Melon Playlist Dataset, a public dataset of mel-spectrograms for 649,091tracks and 148,826 associated playlists annotated by 30,652 different tags. All the data is gathered from Melon, a popular Korean streaming service. The dataset is suitable for music information retrieval tasks, in particular, auto-tagging and automatic playlist continuation. Even though the latter can be addressed by collaborative filtering approaches, audio provides opportunities for research on track suggestions and building systems resistant to the cold-start problem, for which we provide a baseline. Moreover, the playlists and the annotations included in the Melon Playlist Dataset make it suitable for metric learning and representation learning.

* 2021 IEEE International Conference on Acoustics, Speech and Signal Processing

Via

Access Paper or Ask Questions