Picture for Zihao Chen

Zihao Chen

for the ALFA study

DeepDubber-V1: Towards High Quality and Dialogue, Narration, Monologue Adaptive Movie Dubbing Via Multi-Modal Chain-of-Thoughts Reasoning Guidance

Add code
Mar 31, 2025
Viaarxiv icon

DeepAudio-V1:Towards Multi-Modal Multi-Stage End-to-End Video to Speech and Audio Generation

Add code
Mar 28, 2025
Viaarxiv icon

DeepSound-V1: Start to Think Step-by-Step in the Audio Generation from Videos

Add code
Mar 28, 2025
Viaarxiv icon

Enhance Generation Quality of Flow Matching V2A Model via Multi-Step CoT-Like Guidance and Combined Preference Optimization

Add code
Mar 28, 2025
Viaarxiv icon

ATARS: An Aerial Traffic Atomic Activity Recognition and Temporal Segmentation Dataset

Add code
Mar 24, 2025
Viaarxiv icon

Domain Adaptation for Japanese Sentence Embeddings with Contrastive Learning based on Synthetic Sentence Generation

Add code
Mar 12, 2025
Viaarxiv icon

Your contrastive learning problem is secretly a distribution alignment problem

Add code
Feb 27, 2025
Viaarxiv icon

ComposeOn Academy: Transforming Melodic Ideas into Complete Compositions Integrating Music Learning

Add code
Feb 21, 2025
Viaarxiv icon

Multiple Consistency-guided Test-Time Adaptation for Contrastive Audio-Language Models with Unlabeled Audio

Add code
Dec 23, 2024
Figure 1 for Multiple Consistency-guided Test-Time Adaptation for Contrastive Audio-Language Models with Unlabeled Audio
Figure 2 for Multiple Consistency-guided Test-Time Adaptation for Contrastive Audio-Language Models with Unlabeled Audio
Figure 3 for Multiple Consistency-guided Test-Time Adaptation for Contrastive Audio-Language Models with Unlabeled Audio
Figure 4 for Multiple Consistency-guided Test-Time Adaptation for Contrastive Audio-Language Models with Unlabeled Audio
Viaarxiv icon

YingSound: Video-Guided Sound Effects Generation with Multi-modal Chain-of-Thought Controls

Add code
Dec 12, 2024
Figure 1 for YingSound: Video-Guided Sound Effects Generation with Multi-modal Chain-of-Thought Controls
Figure 2 for YingSound: Video-Guided Sound Effects Generation with Multi-modal Chain-of-Thought Controls
Figure 3 for YingSound: Video-Guided Sound Effects Generation with Multi-modal Chain-of-Thought Controls
Figure 4 for YingSound: Video-Guided Sound Effects Generation with Multi-modal Chain-of-Thought Controls
Viaarxiv icon