Picture for Yong Xu

Yong Xu

Multi-path Exploration and Feedback Adjustment for Text-to-Image Person Retrieval

Add code
Oct 26, 2024
Figure 1 for Multi-path Exploration and Feedback Adjustment for Text-to-Image Person Retrieval
Figure 2 for Multi-path Exploration and Feedback Adjustment for Text-to-Image Person Retrieval
Figure 3 for Multi-path Exploration and Feedback Adjustment for Text-to-Image Person Retrieval
Figure 4 for Multi-path Exploration and Feedback Adjustment for Text-to-Image Person Retrieval
Viaarxiv icon

MambaSCI: Efficient Mamba-UNet for Quad-Bayer Patterned Video Snapshot Compressive Imaging

Add code
Oct 18, 2024
Viaarxiv icon

Restorative Speech Enhancement: A Progressive Approach Using SE and Codec Modules

Add code
Oct 02, 2024
Viaarxiv icon

HDMoLE: Mixture of LoRA Experts with Hierarchical Routing and Dynamic Thresholds for Fine-Tuning LLM-based ASR Models

Add code
Sep 30, 2024
Figure 1 for HDMoLE: Mixture of LoRA Experts with Hierarchical Routing and Dynamic Thresholds for Fine-Tuning LLM-based ASR Models
Figure 2 for HDMoLE: Mixture of LoRA Experts with Hierarchical Routing and Dynamic Thresholds for Fine-Tuning LLM-based ASR Models
Figure 3 for HDMoLE: Mixture of LoRA Experts with Hierarchical Routing and Dynamic Thresholds for Fine-Tuning LLM-based ASR Models
Figure 4 for HDMoLE: Mixture of LoRA Experts with Hierarchical Routing and Dynamic Thresholds for Fine-Tuning LLM-based ASR Models
Viaarxiv icon

EzAudio: Enhancing Text-to-Audio Generation with Efficient Diffusion Transformer

Add code
Sep 17, 2024
Viaarxiv icon

LibriheavyMix: A 20,000-Hour Dataset for Single-Channel Reverberant Multi-Talker Speech Separation, ASR and Speaker Diarization

Add code
Sep 01, 2024
Figure 1 for LibriheavyMix: A 20,000-Hour Dataset for Single-Channel Reverberant Multi-Talker Speech Separation, ASR and Speaker Diarization
Figure 2 for LibriheavyMix: A 20,000-Hour Dataset for Single-Channel Reverberant Multi-Talker Speech Separation, ASR and Speaker Diarization
Figure 3 for LibriheavyMix: A 20,000-Hour Dataset for Single-Channel Reverberant Multi-Talker Speech Separation, ASR and Speaker Diarization
Figure 4 for LibriheavyMix: A 20,000-Hour Dataset for Single-Channel Reverberant Multi-Talker Speech Separation, ASR and Speaker Diarization
Viaarxiv icon

Advancing Multi-talker ASR Performance with Large Language Models

Add code
Aug 30, 2024
Viaarxiv icon

Deep Code Search with Naming-Agnostic Contrastive Multi-View Learning

Add code
Aug 18, 2024
Viaarxiv icon

OpenCity: Open Spatio-Temporal Foundation Models for Traffic Prediction

Add code
Aug 16, 2024
Viaarxiv icon

Self-supervised 3D Point Cloud Completion via Multi-view Adversarial Learning

Add code
Jul 13, 2024
Figure 1 for Self-supervised 3D Point Cloud Completion via Multi-view Adversarial Learning
Figure 2 for Self-supervised 3D Point Cloud Completion via Multi-view Adversarial Learning
Figure 3 for Self-supervised 3D Point Cloud Completion via Multi-view Adversarial Learning
Figure 4 for Self-supervised 3D Point Cloud Completion via Multi-view Adversarial Learning
Viaarxiv icon