Picture for Shitong Xu

Shitong Xu

CycleVLA: Proactive Self-Correcting Vision-Language-Action Models via Subtask Backtracking and Minimum Bayes Risk Decoding

Add code
Jan 05, 2026
Viaarxiv icon

Efficient and Microphone-Fault-Tolerant 3D Sound Source Localization

Add code
May 27, 2025
Viaarxiv icon

Target Speaker Extraction through Comparing Noisy Positive and Negative Audio Enrollments

Add code
Feb 23, 2025
Viaarxiv icon

SPEAR: Receiver-to-Receiver Acoustic Neural Warping Field

Add code
Jun 16, 2024
Figure 1 for SPEAR: Receiver-to-Receiver Acoustic Neural Warping Field
Figure 2 for SPEAR: Receiver-to-Receiver Acoustic Neural Warping Field
Figure 3 for SPEAR: Receiver-to-Receiver Acoustic Neural Warping Field
Figure 4 for SPEAR: Receiver-to-Receiver Acoustic Neural Warping Field
Viaarxiv icon

CLIP-Diffusion-LM: Apply Diffusion Model on Image Captioning

Add code
Oct 10, 2022
Figure 1 for CLIP-Diffusion-LM: Apply Diffusion Model on Image Captioning
Figure 2 for CLIP-Diffusion-LM: Apply Diffusion Model on Image Captioning
Figure 3 for CLIP-Diffusion-LM: Apply Diffusion Model on Image Captioning
Figure 4 for CLIP-Diffusion-LM: Apply Diffusion Model on Image Captioning
Viaarxiv icon