Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Minhee Lee

Enriching Music Descriptions with a Finetuned-LLM and Metadata for Text-to-Music Retrieval

Oct 04, 2024

SeungHeon Doh, Minhee Lee, Dasaem Jeong, Juhan Nam

Figure 1 for Enriching Music Descriptions with a Finetuned-LLM and Metadata for Text-to-Music Retrieval

Figure 2 for Enriching Music Descriptions with a Finetuned-LLM and Metadata for Text-to-Music Retrieval

Figure 3 for Enriching Music Descriptions with a Finetuned-LLM and Metadata for Text-to-Music Retrieval

Figure 4 for Enriching Music Descriptions with a Finetuned-LLM and Metadata for Text-to-Music Retrieval

Abstract:Text-to-Music Retrieval, finding music based on a given natural language query, plays a pivotal role in content discovery within extensive music databases. To address this challenge, prior research has predominantly focused on a joint embedding of music audio and text, utilizing it to retrieve music tracks that exactly match descriptive queries related to musical attributes (i.e. genre, instrument) and contextual elements (i.e. mood, theme). However, users also articulate a need to explore music that shares similarities with their favorite tracks or artists, such as \textit{I need a similar track to Superstition by Stevie Wonder}. To address these concerns, this paper proposes an improved Text-to-Music Retrieval model, denoted as TTMR++, which utilizes rich text descriptions generated with a finetuned large language model and metadata. To accomplish this, we obtained various types of seed text from several existing music tag and caption datasets and a knowledge graph dataset of artists and tracks. The experimental results show the effectiveness of TTMR++ in comparison to state-of-the-art music-text joint embedding models through a comprehensive evaluation involving various musical text queries.

* Accepted for publication at the IEEE ICASSP 2024

Via

Access Paper or Ask Questions

Can Audio Reveal Music Performance Difficulty? Insights from the Piano Syllabus Dataset

Mar 06, 2024

Pedro Ramoneda, Minhee Lee, Dasaem Jeong, J. J. Valero-Mas, Xavier Serra

Abstract:Automatically estimating the performance difficulty of a music piece represents a key process in music education to create tailored curricula according to the individual needs of the students. Given its relevance, the Music Information Retrieval (MIR) field depicts some proof-of-concept works addressing this task that mainly focuses on high-level music abstractions such as machine-readable scores or music sheet images. In this regard, the potential of directly analyzing audio recordings has been generally neglected, which prevents students from exploring diverse music pieces that may not have a formal symbolic-level transcription. This work pioneers in the automatic estimation of performance difficulty of music pieces on audio recordings with two precise contributions: (i) the first audio-based difficulty estimation dataset -- namely, Piano Syllabus (PSyllabus) dataset -- featuring 7,901 piano pieces across 11 difficulty levels from 1,233 composers; and (ii) a recognition framework capable of managing different input representations -- both unimodal and multimodal manners -- directly derived from audio to perform the difficulty estimation task. The comprehensive experimentation comprising different pre-training schemes, input modalities, and multi-task scenarios prove the validity of the proposal and establishes PSyllabus as a reference dataset for audio-based difficulty estimation in the MIR field. The dataset as well as the developed code and trained models are publicly shared to promote further research in the field.

Via

Access Paper or Ask Questions