Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Title:MusiLingo: Bridging Music and Text with Pre-trained Language Models for Music Captioning and Query Response

Sep 15, 2023

Zihao Deng, Yinghao Ma, Yudong Liu, Rongchen Guo, Ge Zhang, Wenhu Chen, Wenhao Huang, Emmanouil Benetos

Figure 1 for MusiLingo: Bridging Music and Text with Pre-trained Language Models for Music Captioning and Query Response

Figure 2 for MusiLingo: Bridging Music and Text with Pre-trained Language Models for Music Captioning and Query Response

Figure 3 for MusiLingo: Bridging Music and Text with Pre-trained Language Models for Music Captioning and Query Response

Figure 4 for MusiLingo: Bridging Music and Text with Pre-trained Language Models for Music Captioning and Query Response

Share this with someone who'll enjoy it:

Abstract:Large Language Models (LLMs) have shown immense potential in multimodal applications, yet the convergence of textual and musical domains remains relatively unexplored. To address this gap, we present MusiLingo, a novel system for music caption generation and music-related query responses. MusiLingo employs a single projection layer to align music representations from the pre-trained frozen music audio model MERT with the frozen LLaMA language model, bridging the gap between music audio and textual contexts. We train it on an extensive music caption dataset and fine-tune it with instructional data. Due to the scarcity of high-quality music Q&A datasets, we created the MusicInstruct (MI) dataset from MusicCaps, tailored for open-ended music inquiries. Empirical evaluations demonstrate its competitive performance in generating music captions and composing music-related Q&A pairs. Our introduced dataset enables notable advancements beyond previous ones.

View paper on

Share this with someone who'll enjoy it:

Title:MusiLingo: Bridging Music and Text with Pre-trained Language Models for Music Captioning and Query Response

Paper and Code