Picture for Xudong Liu

Xudong Liu

WaveFormer: Frequency-Time Decoupled Vision Modeling with Wave Equation

Add code
Jan 13, 2026
Viaarxiv icon

AMUSE: Audio-Visual Benchmark and Alignment Framework for Agentic Multi-Speaker Understanding

Add code
Dec 18, 2025
Figure 1 for AMUSE: Audio-Visual Benchmark and Alignment Framework for Agentic Multi-Speaker Understanding
Figure 2 for AMUSE: Audio-Visual Benchmark and Alignment Framework for Agentic Multi-Speaker Understanding
Figure 3 for AMUSE: Audio-Visual Benchmark and Alignment Framework for Agentic Multi-Speaker Understanding
Figure 4 for AMUSE: Audio-Visual Benchmark and Alignment Framework for Agentic Multi-Speaker Understanding
Viaarxiv icon

On Path to Multimodal Historical Reasoning: HistBench and HistAgent

Add code
May 26, 2025
Viaarxiv icon

Caesar: A Low-deviation Compression Approach for Efficient Federated Learning

Add code
Dec 28, 2024
Viaarxiv icon

RobotDiffuse: Motion Planning for Redundant Manipulator based on Diffusion Model

Add code
Dec 27, 2024
Figure 1 for RobotDiffuse: Motion Planning for Redundant Manipulator based on Diffusion Model
Figure 2 for RobotDiffuse: Motion Planning for Redundant Manipulator based on Diffusion Model
Figure 3 for RobotDiffuse: Motion Planning for Redundant Manipulator based on Diffusion Model
Figure 4 for RobotDiffuse: Motion Planning for Redundant Manipulator based on Diffusion Model
Viaarxiv icon

XRAG: eXamining the Core -- Benchmarking Foundational Components in Advanced Retrieval-Augmented Generation

Add code
Dec 24, 2024
Viaarxiv icon

Improving the Consistency in Cross-Lingual Cross-Modal Retrieval with 1-to-K Contrastive Learning

Add code
Jun 26, 2024
Viaarxiv icon

Seed-TTS: A Family of High-Quality Versatile Speech Generation Models

Add code
Jun 04, 2024
Figure 1 for Seed-TTS: A Family of High-Quality Versatile Speech Generation Models
Figure 2 for Seed-TTS: A Family of High-Quality Versatile Speech Generation Models
Figure 3 for Seed-TTS: A Family of High-Quality Versatile Speech Generation Models
Figure 4 for Seed-TTS: A Family of High-Quality Versatile Speech Generation Models
Viaarxiv icon

SCE-MAE: Selective Correspondence Enhancement with Masked Autoencoder for Self-Supervised Landmark Estimation

Add code
May 28, 2024
Viaarxiv icon

LEOD: Label-Efficient Object Detection for Event Cameras

Add code
Nov 29, 2023
Figure 1 for LEOD: Label-Efficient Object Detection for Event Cameras
Figure 2 for LEOD: Label-Efficient Object Detection for Event Cameras
Figure 3 for LEOD: Label-Efficient Object Detection for Event Cameras
Figure 4 for LEOD: Label-Efficient Object Detection for Event Cameras
Viaarxiv icon