Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Title:T-CLAP: Temporal-Enhanced Contrastive Language-Audio Pretraining

Apr 27, 2024

Yi Yuan, Zhuo Chen, Xubo Liu, Haohe Liu, Xuenan Xu, Dongya Jia, Yuanzhe Chen, Mark D. Plumbley, Wenwu Wang

Figure 1 for T-CLAP: Temporal-Enhanced Contrastive Language-Audio Pretraining

Figure 2 for T-CLAP: Temporal-Enhanced Contrastive Language-Audio Pretraining

Figure 3 for T-CLAP: Temporal-Enhanced Contrastive Language-Audio Pretraining

Figure 4 for T-CLAP: Temporal-Enhanced Contrastive Language-Audio Pretraining

Share this with someone who'll enjoy it:

Abstract:Contrastive language-audio pretraining~(CLAP) has been developed to align the representations of audio and language, achieving remarkable performance in retrieval and classification tasks. However, current CLAP struggles to capture temporal information within audio and text features, presenting substantial limitations for tasks such as audio retrieval and generation. To address this gap, we introduce T-CLAP, a temporal-enhanced CLAP model. We use Large Language Models~(LLMs) and mixed-up strategies to generate temporal-contrastive captions for audio clips from extensive audio-text datasets. Subsequently, a new temporal-focused contrastive loss is designed to fine-tune the CLAP model by incorporating these synthetic data. We conduct comprehensive experiments and analysis in multiple downstream tasks. T-CLAP shows improved capability in capturing the temporal relationship of sound events and outperforms state-of-the-art models by a significant margin.

* Preprint submitted to IEEE MLSP 2024

View paper on

Share this with someone who'll enjoy it:

Title:T-CLAP: Temporal-Enhanced Contrastive Language-Audio Pretraining

Paper and Code