Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Title:DSCLAP: Domain-Specific Contrastive Language-Audio Pre-Training

Sep 14, 2024

Shengqiang Liu, Da Liu, Anna Wang, Zhiyu Zhang, Jie Gao, Yali Li

Figure 1 for DSCLAP: Domain-Specific Contrastive Language-Audio Pre-Training

Figure 2 for DSCLAP: Domain-Specific Contrastive Language-Audio Pre-Training

Figure 3 for DSCLAP: Domain-Specific Contrastive Language-Audio Pre-Training

Figure 4 for DSCLAP: Domain-Specific Contrastive Language-Audio Pre-Training

Share this with someone who'll enjoy it:

Abstract:Analyzing real-world multimodal signals is an essential and challenging task for intelligent voice assistants (IVAs). Mainstream approaches have achieved remarkable performance on various downstream tasks of IVAs with pre-trained audio models and text models. However, these models are pre-trained independently and usually on tasks different from target domains, resulting in sub-optimal modality representations for downstream tasks. Moreover, in many domains, collecting enough language-audio pairs is extremely hard, and transcribing raw audio also requires high professional skills, making it difficult or even infeasible to joint pre-training. To address these painpoints, we propose DSCLAP, a simple and effective framework that enables language-audio pre-training with only raw audio signal input. Specifically, DSCLAP converts raw audio signals into text via an ASR system and combines a contrastive learning objective and a language-audio matching objective to align the audio and ASR transcriptions. We pre-train DSCLAP on 12,107 hours of in-vehicle domain audio. Empirical results on two downstream tasks show that while conceptually simple, DSCLAP significantly outperforms the baseline models in all metrics, showing great promise for domain-specific IVAs applications.

View paper on

Share this with someone who'll enjoy it:

Title:DSCLAP: Domain-Specific Contrastive Language-Audio Pre-Training

Paper and Code