Picture for Wei Xue

Wei Xue

SINGER: Vivid Audio-driven Singing Video Generation with Multi-scale Spectral Diffusion Model

Add code
Dec 04, 2024
Viaarxiv icon

Foundation Cures Personalization: Recovering Facial Personalized Models' Prompt Consistency

Add code
Nov 22, 2024
Viaarxiv icon

LEADRE: Multi-Faceted Knowledge Enhanced LLM Empowered Display Advertisement Recommender System

Add code
Nov 21, 2024
Viaarxiv icon

pTSE-T: Presentation Target Speaker Extraction using Unaligned Text Cues

Add code
Nov 05, 2024
Viaarxiv icon

EVA: An Embodied World Model for Future Video Anticipation

Add code
Oct 20, 2024
Figure 1 for EVA: An Embodied World Model for Future Video Anticipation
Figure 2 for EVA: An Embodied World Model for Future Video Anticipation
Figure 3 for EVA: An Embodied World Model for Future Video Anticipation
Figure 4 for EVA: An Embodied World Model for Future Video Anticipation
Viaarxiv icon

FlashAudio: Rectified Flows for Fast and High-Fidelity Text-to-Audio Generation

Add code
Oct 16, 2024
Figure 1 for FlashAudio: Rectified Flows for Fast and High-Fidelity Text-to-Audio Generation
Figure 2 for FlashAudio: Rectified Flows for Fast and High-Fidelity Text-to-Audio Generation
Figure 3 for FlashAudio: Rectified Flows for Fast and High-Fidelity Text-to-Audio Generation
Figure 4 for FlashAudio: Rectified Flows for Fast and High-Fidelity Text-to-Audio Generation
Viaarxiv icon

Both Ears Wide Open: Towards Language-Driven Spatial Audio Generation

Add code
Oct 14, 2024
Figure 1 for Both Ears Wide Open: Towards Language-Driven Spatial Audio Generation
Figure 2 for Both Ears Wide Open: Towards Language-Driven Spatial Audio Generation
Figure 3 for Both Ears Wide Open: Towards Language-Driven Spatial Audio Generation
Figure 4 for Both Ears Wide Open: Towards Language-Driven Spatial Audio Generation
Viaarxiv icon

Editing Music with Melody and Text: Using ControlNet for Diffusion Transformer

Add code
Oct 07, 2024
Figure 1 for Editing Music with Melody and Text: Using ControlNet for Diffusion Transformer
Figure 2 for Editing Music with Melody and Text: Using ControlNet for Diffusion Transformer
Figure 3 for Editing Music with Melody and Text: Using ControlNet for Diffusion Transformer
Viaarxiv icon

You Know What I'm Saying -- Jailbreak Attack via Implicit Reference

Add code
Oct 04, 2024
Figure 1 for You Know What I'm Saying -- Jailbreak Attack via Implicit Reference
Figure 2 for You Know What I'm Saying -- Jailbreak Attack via Implicit Reference
Figure 3 for You Know What I'm Saying -- Jailbreak Attack via Implicit Reference
Figure 4 for You Know What I'm Saying -- Jailbreak Attack via Implicit Reference
Viaarxiv icon

PSHuman: Photorealistic Single-view Human Reconstruction using Cross-Scale Diffusion

Add code
Sep 16, 2024
Viaarxiv icon