Picture for Chaofan Ding

Chaofan Ding

DeepDubber-V1: Towards High Quality and Dialogue, Narration, Monologue Adaptive Movie Dubbing Via Multi-Modal Chain-of-Thoughts Reasoning Guidance

Add code
Mar 31, 2025
Viaarxiv icon

DeepAudio-V1:Towards Multi-Modal Multi-Stage End-to-End Video to Speech and Audio Generation

Add code
Mar 28, 2025
Viaarxiv icon

DeepSound-V1: Start to Think Step-by-Step in the Audio Generation from Videos

Add code
Mar 28, 2025
Viaarxiv icon

Enhance Generation Quality of Flow Matching V2A Model via Multi-Step CoT-Like Guidance and Combined Preference Optimization

Add code
Mar 28, 2025
Viaarxiv icon

Enhancing Reasoning through Process Supervision with Monte Carlo Tree Search

Add code
Jan 02, 2025
Figure 1 for Enhancing Reasoning through Process Supervision with Monte Carlo Tree Search
Figure 2 for Enhancing Reasoning through Process Supervision with Monte Carlo Tree Search
Figure 3 for Enhancing Reasoning through Process Supervision with Monte Carlo Tree Search
Viaarxiv icon

Towards Intrinsic Self-Correction Enhancement in Monte Carlo Tree Search Boosted Reasoning via Iterative Preference Learning

Add code
Dec 23, 2024
Figure 1 for Towards Intrinsic Self-Correction Enhancement in Monte Carlo Tree Search Boosted Reasoning via Iterative Preference Learning
Figure 2 for Towards Intrinsic Self-Correction Enhancement in Monte Carlo Tree Search Boosted Reasoning via Iterative Preference Learning
Figure 3 for Towards Intrinsic Self-Correction Enhancement in Monte Carlo Tree Search Boosted Reasoning via Iterative Preference Learning
Figure 4 for Towards Intrinsic Self-Correction Enhancement in Monte Carlo Tree Search Boosted Reasoning via Iterative Preference Learning
Viaarxiv icon

Multiple Consistency-guided Test-Time Adaptation for Contrastive Audio-Language Models with Unlabeled Audio

Add code
Dec 23, 2024
Figure 1 for Multiple Consistency-guided Test-Time Adaptation for Contrastive Audio-Language Models with Unlabeled Audio
Figure 2 for Multiple Consistency-guided Test-Time Adaptation for Contrastive Audio-Language Models with Unlabeled Audio
Figure 3 for Multiple Consistency-guided Test-Time Adaptation for Contrastive Audio-Language Models with Unlabeled Audio
Figure 4 for Multiple Consistency-guided Test-Time Adaptation for Contrastive Audio-Language Models with Unlabeled Audio
Viaarxiv icon

Low-Rank Adaptation with Task-Relevant Feature Enhancement for Fine-tuning Language Models

Add code
Dec 13, 2024
Figure 1 for Low-Rank Adaptation with Task-Relevant Feature Enhancement for Fine-tuning Language Models
Figure 2 for Low-Rank Adaptation with Task-Relevant Feature Enhancement for Fine-tuning Language Models
Figure 3 for Low-Rank Adaptation with Task-Relevant Feature Enhancement for Fine-tuning Language Models
Figure 4 for Low-Rank Adaptation with Task-Relevant Feature Enhancement for Fine-tuning Language Models
Viaarxiv icon

YingSound: Video-Guided Sound Effects Generation with Multi-modal Chain-of-Thought Controls

Add code
Dec 12, 2024
Figure 1 for YingSound: Video-Guided Sound Effects Generation with Multi-modal Chain-of-Thought Controls
Figure 2 for YingSound: Video-Guided Sound Effects Generation with Multi-modal Chain-of-Thought Controls
Figure 3 for YingSound: Video-Guided Sound Effects Generation with Multi-modal Chain-of-Thought Controls
Figure 4 for YingSound: Video-Guided Sound Effects Generation with Multi-modal Chain-of-Thought Controls
Viaarxiv icon

Self-Supervised Learning of Deviation in Latent Representation for Co-speech Gesture Video Generation

Add code
Sep 26, 2024
Figure 1 for Self-Supervised Learning of Deviation in Latent Representation for Co-speech Gesture Video Generation
Figure 2 for Self-Supervised Learning of Deviation in Latent Representation for Co-speech Gesture Video Generation
Figure 3 for Self-Supervised Learning of Deviation in Latent Representation for Co-speech Gesture Video Generation
Figure 4 for Self-Supervised Learning of Deviation in Latent Representation for Co-speech Gesture Video Generation
Viaarxiv icon