Picture for Ming Tao

Ming Tao

SoulX-Duplug: Plug-and-Play Streaming State Prediction Module for Realtime Full-Duplex Speech Conversation

Add code
Mar 16, 2026
Viaarxiv icon

SoulX-LiveAct: Towards Hour-Scale Real-Time Human Animation with Neighbor Forcing and ConvKV Memory

Add code
Mar 12, 2026
Viaarxiv icon

SoulX-Singer: Towards High-Quality Zero-Shot Singing Voice Synthesis

Add code
Feb 08, 2026
Viaarxiv icon

SoulX-FlashTalk: Real-Time Infinite Streaming of Audio-Driven Avatars via Self-Correcting Bidirectional Distillation

Add code
Jan 06, 2026
Viaarxiv icon

SoulX-LiveTalk: Real-Time Infinite Streaming of Audio-Driven Avatars via Self-Correcting Bidirectional Distillation

Add code
Dec 31, 2025
Viaarxiv icon

Marrying Autoregressive Transformer and Diffusion with Multi-Reference Autoregression

Add code
Jun 11, 2025
Figure 1 for Marrying Autoregressive Transformer and Diffusion with Multi-Reference Autoregression
Figure 2 for Marrying Autoregressive Transformer and Diffusion with Multi-Reference Autoregression
Figure 3 for Marrying Autoregressive Transformer and Diffusion with Multi-Reference Autoregression
Figure 4 for Marrying Autoregressive Transformer and Diffusion with Multi-Reference Autoregression
Viaarxiv icon

Replace in Translation: Boost Concept Alignment in Counterfactual Text-to-Image

Add code
May 20, 2025
Figure 1 for Replace in Translation: Boost Concept Alignment in Counterfactual Text-to-Image
Figure 2 for Replace in Translation: Boost Concept Alignment in Counterfactual Text-to-Image
Figure 3 for Replace in Translation: Boost Concept Alignment in Counterfactual Text-to-Image
Figure 4 for Replace in Translation: Boost Concept Alignment in Counterfactual Text-to-Image
Viaarxiv icon

Do We Need to Design Specific Diffusion Models for Different Tasks? Try ONE-PIC

Add code
Dec 07, 2024
Viaarxiv icon

Multimodal Emotion Recognition with Vision-language Prompting and Modality Dropout

Add code
Sep 11, 2024
Figure 1 for Multimodal Emotion Recognition with Vision-language Prompting and Modality Dropout
Figure 2 for Multimodal Emotion Recognition with Vision-language Prompting and Modality Dropout
Figure 3 for Multimodal Emotion Recognition with Vision-language Prompting and Modality Dropout
Viaarxiv icon

StoryImager: A Unified and Efficient Framework for Coherent Story Visualization and Completion

Add code
Apr 09, 2024
Figure 1 for StoryImager: A Unified and Efficient Framework for Coherent Story Visualization and Completion
Figure 2 for StoryImager: A Unified and Efficient Framework for Coherent Story Visualization and Completion
Figure 3 for StoryImager: A Unified and Efficient Framework for Coherent Story Visualization and Completion
Figure 4 for StoryImager: A Unified and Efficient Framework for Coherent Story Visualization and Completion
Viaarxiv icon