Picture for Zhisheng Zheng

Zhisheng Zheng

Scaling Rich Style-Prompted Text-to-Speech Datasets

Add code
Mar 06, 2025
Viaarxiv icon

SLAM-AAC: Enhancing Audio Captioning with Paraphrasing Augmentation and CLAP-Refine through LLMs

Add code
Oct 12, 2024
Figure 1 for SLAM-AAC: Enhancing Audio Captioning with Paraphrasing Augmentation and CLAP-Refine through LLMs
Figure 2 for SLAM-AAC: Enhancing Audio Captioning with Paraphrasing Augmentation and CLAP-Refine through LLMs
Figure 3 for SLAM-AAC: Enhancing Audio Captioning with Paraphrasing Augmentation and CLAP-Refine through LLMs
Figure 4 for SLAM-AAC: Enhancing Audio Captioning with Paraphrasing Augmentation and CLAP-Refine through LLMs
Viaarxiv icon

DRCap: Decoding CLAP Latents with Retrieval-augmented Generation for Zero-shot Audio Captioning

Add code
Oct 12, 2024
Figure 1 for DRCap: Decoding CLAP Latents with Retrieval-augmented Generation for Zero-shot Audio Captioning
Figure 2 for DRCap: Decoding CLAP Latents with Retrieval-augmented Generation for Zero-shot Audio Captioning
Figure 3 for DRCap: Decoding CLAP Latents with Retrieval-augmented Generation for Zero-shot Audio Captioning
Figure 4 for DRCap: Decoding CLAP Latents with Retrieval-augmented Generation for Zero-shot Audio Captioning
Viaarxiv icon

EmoBox: Multilingual Multi-corpus Speech Emotion Recognition Toolkit and Benchmark

Add code
Jun 11, 2024
Viaarxiv icon

BAT: Learning to Reason about Spatial Sounds with Large Language Models

Add code
Feb 02, 2024
Viaarxiv icon

EAT: Self-Supervised Pre-Training with Efficient Audio Transformer

Add code
Jan 07, 2024
Figure 1 for EAT: Self-Supervised Pre-Training with Efficient Audio Transformer
Figure 2 for EAT: Self-Supervised Pre-Training with Efficient Audio Transformer
Figure 3 for EAT: Self-Supervised Pre-Training with Efficient Audio Transformer
Figure 4 for EAT: Self-Supervised Pre-Training with Efficient Audio Transformer
Viaarxiv icon

emotion2vec: Self-Supervised Pre-Training for Speech Emotion Representation

Add code
Dec 23, 2023
Viaarxiv icon

Fast-HuBERT: An Efficient Training Framework for Self-Supervised Speech Representation Learning

Add code
Sep 29, 2023
Figure 1 for Fast-HuBERT: An Efficient Training Framework for Self-Supervised Speech Representation Learning
Figure 2 for Fast-HuBERT: An Efficient Training Framework for Self-Supervised Speech Representation Learning
Figure 3 for Fast-HuBERT: An Efficient Training Framework for Self-Supervised Speech Representation Learning
Figure 4 for Fast-HuBERT: An Efficient Training Framework for Self-Supervised Speech Representation Learning
Viaarxiv icon

Leveraging Speech PTM, Text LLM, and Emotional TTS for Speech Emotion Recognition

Add code
Sep 19, 2023
Figure 1 for Leveraging Speech PTM, Text LLM, and Emotional TTS for Speech Emotion Recognition
Figure 2 for Leveraging Speech PTM, Text LLM, and Emotional TTS for Speech Emotion Recognition
Figure 3 for Leveraging Speech PTM, Text LLM, and Emotional TTS for Speech Emotion Recognition
Figure 4 for Leveraging Speech PTM, Text LLM, and Emotional TTS for Speech Emotion Recognition
Viaarxiv icon

Unsupervised Active Learning: Optimizing Labeling Cost-Effectiveness for Automatic Speech Recognition

Add code
Aug 28, 2023
Viaarxiv icon