Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Title:DiveSound: LLM-Assisted Automatic Taxonomy Construction for Diverse Audio Generation

Jul 18, 2024

Baihan Li, Zeyu Xie, Xuenan Xu, Yiwei Guo, Ming Yan, Ji Zhang, Kai Yu, Mengyue Wu

Figure 1 for DiveSound: LLM-Assisted Automatic Taxonomy Construction for Diverse Audio Generation

Figure 2 for DiveSound: LLM-Assisted Automatic Taxonomy Construction for Diverse Audio Generation

Figure 3 for DiveSound: LLM-Assisted Automatic Taxonomy Construction for Diverse Audio Generation

Figure 4 for DiveSound: LLM-Assisted Automatic Taxonomy Construction for Diverse Audio Generation

Share this with someone who'll enjoy it:

Abstract:Audio generation has attracted significant attention. Despite remarkable enhancement in audio quality, existing models overlook diversity evaluation. This is partially due to the lack of a systematic sound class diversity framework and a matching dataset. To address these issues, we propose DiveSound, a novel framework for constructing multimodal datasets with in-class diversified taxonomy, assisted by large language models. As both textual and visual information can be utilized to guide diverse generation, DiveSound leverages multimodal contrastive representations in data construction. Our framework is highly autonomous and can be easily scaled up. We provide a textaudio-image aligned diversity dataset whose sound event class tags have an average of 2.42 subcategories. Text-to-audio experiments on the constructed dataset show a substantial increase of diversity with the help of the guidance of visual information.

View paper on

Share this with someone who'll enjoy it:

Title:DiveSound: LLM-Assisted Automatic Taxonomy Construction for Diverse Audio Generation

Paper and Code