Picture for Zhisheng Zheng

Zhisheng Zheng

DRCap: Decoding CLAP Latents with Retrieval-augmented Generation for Zero-shot Audio Captioning

Add code
Oct 12, 2024
Viaarxiv icon

SLAM-AAC: Enhancing Audio Captioning with Paraphrasing Augmentation and CLAP-Refine through LLMs

Add code
Oct 12, 2024
Figure 1 for SLAM-AAC: Enhancing Audio Captioning with Paraphrasing Augmentation and CLAP-Refine through LLMs
Figure 2 for SLAM-AAC: Enhancing Audio Captioning with Paraphrasing Augmentation and CLAP-Refine through LLMs
Figure 3 for SLAM-AAC: Enhancing Audio Captioning with Paraphrasing Augmentation and CLAP-Refine through LLMs
Figure 4 for SLAM-AAC: Enhancing Audio Captioning with Paraphrasing Augmentation and CLAP-Refine through LLMs
Viaarxiv icon

EmoBox: Multilingual Multi-corpus Speech Emotion Recognition Toolkit and Benchmark

Add code
Jun 11, 2024
Viaarxiv icon

BAT: Learning to Reason about Spatial Sounds with Large Language Models

Add code
Feb 02, 2024
Viaarxiv icon

EAT: Self-Supervised Pre-Training with Efficient Audio Transformer

Add code
Jan 07, 2024
Viaarxiv icon

emotion2vec: Self-Supervised Pre-Training for Speech Emotion Representation

Add code
Dec 23, 2023
Viaarxiv icon

Fast-HuBERT: An Efficient Training Framework for Self-Supervised Speech Representation Learning

Add code
Sep 29, 2023
Viaarxiv icon

Leveraging Speech PTM, Text LLM, and Emotional TTS for Speech Emotion Recognition

Add code
Sep 19, 2023
Viaarxiv icon

Unsupervised Active Learning: Optimizing Labeling Cost-Effectiveness for Automatic Speech Recognition

Add code
Aug 28, 2023
Viaarxiv icon

Pushing the Limits of Unsupervised Unit Discovery for SSL Speech Representation

Add code
Jun 15, 2023
Viaarxiv icon