Picture for Haomin Zhang

Haomin Zhang

DeepAudio-V1:Towards Multi-Modal Multi-Stage End-to-End Video to Speech and Audio Generation

Add code
Mar 28, 2025
Viaarxiv icon

Enhance Generation Quality of Flow Matching V2A Model via Multi-Step CoT-Like Guidance and Combined Preference Optimization

Add code
Mar 28, 2025
Viaarxiv icon

Multiple Consistency-guided Test-Time Adaptation for Contrastive Audio-Language Models with Unlabeled Audio

Add code
Dec 23, 2024
Figure 1 for Multiple Consistency-guided Test-Time Adaptation for Contrastive Audio-Language Models with Unlabeled Audio
Figure 2 for Multiple Consistency-guided Test-Time Adaptation for Contrastive Audio-Language Models with Unlabeled Audio
Figure 3 for Multiple Consistency-guided Test-Time Adaptation for Contrastive Audio-Language Models with Unlabeled Audio
Figure 4 for Multiple Consistency-guided Test-Time Adaptation for Contrastive Audio-Language Models with Unlabeled Audio
Viaarxiv icon

YingSound: Video-Guided Sound Effects Generation with Multi-modal Chain-of-Thought Controls

Add code
Dec 12, 2024
Figure 1 for YingSound: Video-Guided Sound Effects Generation with Multi-modal Chain-of-Thought Controls
Figure 2 for YingSound: Video-Guided Sound Effects Generation with Multi-modal Chain-of-Thought Controls
Figure 3 for YingSound: Video-Guided Sound Effects Generation with Multi-modal Chain-of-Thought Controls
Figure 4 for YingSound: Video-Guided Sound Effects Generation with Multi-modal Chain-of-Thought Controls
Viaarxiv icon