Picture for Wei Xue

Wei Xue

pTSE-T: Presentation Target Speaker Extraction using Unaligned Text Cues

Add code
Nov 05, 2024
Viaarxiv icon

EVA: An Embodied World Model for Future Video Anticipation

Add code
Oct 20, 2024
Viaarxiv icon

FlashAudio: Rectified Flows for Fast and High-Fidelity Text-to-Audio Generation

Add code
Oct 16, 2024
Viaarxiv icon

Both Ears Wide Open: Towards Language-Driven Spatial Audio Generation

Add code
Oct 14, 2024
Figure 1 for Both Ears Wide Open: Towards Language-Driven Spatial Audio Generation
Figure 2 for Both Ears Wide Open: Towards Language-Driven Spatial Audio Generation
Figure 3 for Both Ears Wide Open: Towards Language-Driven Spatial Audio Generation
Figure 4 for Both Ears Wide Open: Towards Language-Driven Spatial Audio Generation
Viaarxiv icon

Editing Music with Melody and Text: Using ControlNet for Diffusion Transformer

Add code
Oct 07, 2024
Viaarxiv icon

You Know What I'm Saying -- Jailbreak Attack via Implicit Reference

Add code
Oct 04, 2024
Viaarxiv icon

PSHuman: Photorealistic Single-view Human Reconstruction using Cross-Scale Diffusion

Add code
Sep 16, 2024
Viaarxiv icon

HiPrompt: Tuning-free Higher-Resolution Generation with Hierarchical MLLM Prompts

Add code
Sep 04, 2024
Viaarxiv icon

Codec Does Matter: Exploring the Semantic Shortcoming of Codec for Audio Language Model

Add code
Aug 30, 2024
Figure 1 for Codec Does Matter: Exploring the Semantic Shortcoming of Codec for Audio Language Model
Figure 2 for Codec Does Matter: Exploring the Semantic Shortcoming of Codec for Audio Language Model
Figure 3 for Codec Does Matter: Exploring the Semantic Shortcoming of Codec for Audio Language Model
Figure 4 for Codec Does Matter: Exploring the Semantic Shortcoming of Codec for Audio Language Model
Viaarxiv icon

AgentMonitor: A Plug-and-Play Framework for Predictive and Secure Multi-Agent Systems

Add code
Aug 27, 2024
Viaarxiv icon