Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Rongchen Guo

MusiLingo: Bridging Music and Text with Pre-trained Language Models for Music Captioning and Query Response

Sep 15, 2023

Zihao Deng, Yinghao Ma, Yudong Liu, Rongchen Guo, Ge Zhang, Wenhu Chen, Wenhao Huang, Emmanouil Benetos

Figure 1 for MusiLingo: Bridging Music and Text with Pre-trained Language Models for Music Captioning and Query Response

Figure 2 for MusiLingo: Bridging Music and Text with Pre-trained Language Models for Music Captioning and Query Response

Figure 3 for MusiLingo: Bridging Music and Text with Pre-trained Language Models for Music Captioning and Query Response

Figure 4 for MusiLingo: Bridging Music and Text with Pre-trained Language Models for Music Captioning and Query Response

Abstract:Large Language Models (LLMs) have shown immense potential in multimodal applications, yet the convergence of textual and musical domains remains relatively unexplored. To address this gap, we present MusiLingo, a novel system for music caption generation and music-related query responses. MusiLingo employs a single projection layer to align music representations from the pre-trained frozen music audio model MERT with the frozen LLaMA language model, bridging the gap between music audio and textual contexts. We train it on an extensive music caption dataset and fine-tune it with instructional data. Due to the scarcity of high-quality music Q&A datasets, we created the MusicInstruct (MI) dataset from MusicCaps, tailored for open-ended music inquiries. Empirical evaluations demonstrate its competitive performance in generating music captions and composing music-related Q&A pairs. Our introduced dataset enables notable advancements beyond previous ones.

Via

Access Paper or Ask Questions

Comparative Visual Analytics for Assessing Medical Records with Sequence Embedding

Mar 23, 2020

Rongchen Guo, Takanori Fujiwara, Yiran Li, Kelly M. Lima, Soman Sen, Nam K. Tran, Kwan-Liu Ma

Figure 1 for Comparative Visual Analytics for Assessing Medical Records with Sequence Embedding

Figure 2 for Comparative Visual Analytics for Assessing Medical Records with Sequence Embedding

Figure 3 for Comparative Visual Analytics for Assessing Medical Records with Sequence Embedding

Figure 4 for Comparative Visual Analytics for Assessing Medical Records with Sequence Embedding

Abstract:Machine learning for data-driven diagnosis has been actively studied in medicine to provide better healthcare. Supporting analysis of a patient cohort similar to a patient under treatment is a key task for clinicians to make decisions with high confidence. However, such analysis is not straightforward due to the characteristics of medical records: high dimensionality, irregularity in time, and sparsity. To address this challenge, we introduce a method for similarity calculation of medical records. Our method employs event and sequence embeddings. While we use an autoencoder for the event embedding, we apply its variant with the self-attention mechanism for the sequence embedding. Moreover, in order to better handle the irregularity of data, we enhance the self-attention mechanism with consideration of different time intervals. We have developed a visual analytics system to support comparative studies of patient records. To make a comparison of sequences with different lengths easier, our system incorporates a sequence alignment method. Through its interactive interface, the user can quickly identify patients of interest and conveniently review both the temporal and multivariate aspects of the patient records. We demonstrate the effectiveness of our design and system with case studies using a real-world dataset from the neonatal intensive care unit of UC Davis.

* This is the author's version of the article that has been accepted in PacificVis 2020 Visualization Meets AI Workshop

Via

Access Paper or Ask Questions