Picture for Wei Xue

Wei Xue

VFX Creator: Animated Visual Effect Generation with Controllable Diffusion Transformer

Add code
Feb 09, 2025
Viaarxiv icon

Llasa: Scaling Train-Time and Inference-Time Compute for Llama-based Speech Synthesis

Add code
Feb 06, 2025
Viaarxiv icon

Every Angle Is Worth A Second Glance: Mining Kinematic Skeletal Structures from Multi-view Joint Cloud

Add code
Feb 05, 2025
Viaarxiv icon

SINGER: Vivid Audio-driven Singing Video Generation with Multi-scale Spectral Diffusion Model

Add code
Dec 04, 2024
Viaarxiv icon

Foundation Cures Personalization: Recovering Facial Personalized Models' Prompt Consistency

Add code
Nov 22, 2024
Viaarxiv icon

LEADRE: Multi-Faceted Knowledge Enhanced LLM Empowered Display Advertisement Recommender System

Add code
Nov 21, 2024
Figure 1 for LEADRE: Multi-Faceted Knowledge Enhanced LLM Empowered Display Advertisement Recommender System
Figure 2 for LEADRE: Multi-Faceted Knowledge Enhanced LLM Empowered Display Advertisement Recommender System
Figure 3 for LEADRE: Multi-Faceted Knowledge Enhanced LLM Empowered Display Advertisement Recommender System
Figure 4 for LEADRE: Multi-Faceted Knowledge Enhanced LLM Empowered Display Advertisement Recommender System
Viaarxiv icon

pTSE-T: Presentation Target Speaker Extraction using Unaligned Text Cues

Add code
Nov 05, 2024
Figure 1 for pTSE-T: Presentation Target Speaker Extraction using Unaligned Text Cues
Figure 2 for pTSE-T: Presentation Target Speaker Extraction using Unaligned Text Cues
Figure 3 for pTSE-T: Presentation Target Speaker Extraction using Unaligned Text Cues
Figure 4 for pTSE-T: Presentation Target Speaker Extraction using Unaligned Text Cues
Viaarxiv icon

EVA: An Embodied World Model for Future Video Anticipation

Add code
Oct 20, 2024
Figure 1 for EVA: An Embodied World Model for Future Video Anticipation
Figure 2 for EVA: An Embodied World Model for Future Video Anticipation
Figure 3 for EVA: An Embodied World Model for Future Video Anticipation
Figure 4 for EVA: An Embodied World Model for Future Video Anticipation
Viaarxiv icon

FlashAudio: Rectified Flows for Fast and High-Fidelity Text-to-Audio Generation

Add code
Oct 16, 2024
Figure 1 for FlashAudio: Rectified Flows for Fast and High-Fidelity Text-to-Audio Generation
Figure 2 for FlashAudio: Rectified Flows for Fast and High-Fidelity Text-to-Audio Generation
Figure 3 for FlashAudio: Rectified Flows for Fast and High-Fidelity Text-to-Audio Generation
Figure 4 for FlashAudio: Rectified Flows for Fast and High-Fidelity Text-to-Audio Generation
Viaarxiv icon

Both Ears Wide Open: Towards Language-Driven Spatial Audio Generation

Add code
Oct 14, 2024
Figure 1 for Both Ears Wide Open: Towards Language-Driven Spatial Audio Generation
Figure 2 for Both Ears Wide Open: Towards Language-Driven Spatial Audio Generation
Figure 3 for Both Ears Wide Open: Towards Language-Driven Spatial Audio Generation
Figure 4 for Both Ears Wide Open: Towards Language-Driven Spatial Audio Generation
Viaarxiv icon