Picture for Zhen Xiao

Zhen Xiao

SDTalk: Structured Facial Priors and Dual-Branch Motion Fields for Generalizable Gaussian Talking Head Synthesis

Add code
May 11, 2026
Viaarxiv icon

Sub-Footprint Effect Correction in FW-LiDAR Point Clouds via Intra-Footprint Target Unmixing

Add code
May 11, 2026
Viaarxiv icon

FlowScene: Style-Consistent Indoor Scene Generation with Multimodal Graph Rectified Flow

Add code
Mar 20, 2026
Viaarxiv icon

DARA: Few-shot Budget Allocation in Online Advertising via In-Context Decision Making with RL-Finetuned LLMs

Add code
Jan 21, 2026
Viaarxiv icon

Genie Sim 3.0 : A High-Fidelity Comprehensive Simulation Platform for Humanoid Robot

Add code
Jan 05, 2026
Viaarxiv icon

MMGDreamer: Mixed-Modality Graph for Geometry-Controllable 3D Indoor Scene Generation

Add code
Feb 09, 2025
Viaarxiv icon

Investigating Neural Audio Codecs for Speech Language Model-Based Speech Generation

Add code
Sep 06, 2024
Figure 1 for Investigating Neural Audio Codecs for Speech Language Model-Based Speech Generation
Figure 2 for Investigating Neural Audio Codecs for Speech Language Model-Based Speech Generation
Figure 3 for Investigating Neural Audio Codecs for Speech Language Model-Based Speech Generation
Figure 4 for Investigating Neural Audio Codecs for Speech Language Model-Based Speech Generation
Viaarxiv icon

Exploring Robust Face-Voice Matching in Multilingual Environments

Add code
Jul 29, 2024
Figure 1 for Exploring Robust Face-Voice Matching in Multilingual Environments
Figure 2 for Exploring Robust Face-Voice Matching in Multilingual Environments
Figure 3 for Exploring Robust Face-Voice Matching in Multilingual Environments
Figure 4 for Exploring Robust Face-Voice Matching in Multilingual Environments
Viaarxiv icon

Laugh Now Cry Later: Controlling Time-Varying Emotional States of Flow-Matching-Based Zero-Shot Text-to-Speech

Add code
Jul 17, 2024
Figure 1 for Laugh Now Cry Later: Controlling Time-Varying Emotional States of Flow-Matching-Based Zero-Shot Text-to-Speech
Figure 2 for Laugh Now Cry Later: Controlling Time-Varying Emotional States of Flow-Matching-Based Zero-Shot Text-to-Speech
Figure 3 for Laugh Now Cry Later: Controlling Time-Varying Emotional States of Flow-Matching-Based Zero-Shot Text-to-Speech
Figure 4 for Laugh Now Cry Later: Controlling Time-Varying Emotional States of Flow-Matching-Based Zero-Shot Text-to-Speech
Viaarxiv icon

E2 TTS: Embarrassingly Easy Fully Non-Autoregressive Zero-Shot TTS

Add code
Jun 26, 2024
Figure 1 for E2 TTS: Embarrassingly Easy Fully Non-Autoregressive Zero-Shot TTS
Figure 2 for E2 TTS: Embarrassingly Easy Fully Non-Autoregressive Zero-Shot TTS
Figure 3 for E2 TTS: Embarrassingly Easy Fully Non-Autoregressive Zero-Shot TTS
Figure 4 for E2 TTS: Embarrassingly Easy Fully Non-Autoregressive Zero-Shot TTS
Viaarxiv icon