Picture for Xiaofei Wang

Xiaofei Wang

GLRD: Global-Local Collaborative Reason and Debate with PSL for 3D Open-Vocabulary Detection

Add code
Mar 26, 2025
Viaarxiv icon

Zero-Shot Audio-Visual Editing via Cross-Modal Delta Denoising

Add code
Mar 26, 2025
Viaarxiv icon

Joint Modelling Histology and Molecular Markers for Cancer Classification

Add code
Feb 11, 2025
Viaarxiv icon

Streaming Speaker Change Detection and Gender Classification for Transducer-Based Multi-Talker Speech Translation

Add code
Feb 04, 2025
Figure 1 for Streaming Speaker Change Detection and Gender Classification for Transducer-Based Multi-Talker Speech Translation
Figure 2 for Streaming Speaker Change Detection and Gender Classification for Transducer-Based Multi-Talker Speech Translation
Figure 3 for Streaming Speaker Change Detection and Gender Classification for Transducer-Based Multi-Talker Speech Translation
Figure 4 for Streaming Speaker Change Detection and Gender Classification for Transducer-Based Multi-Talker Speech Translation
Viaarxiv icon

Summary of the NOTSOFAR-1 Challenge: Highlights and Learnings

Add code
Jan 28, 2025
Figure 1 for Summary of the NOTSOFAR-1 Challenge: Highlights and Learnings
Figure 2 for Summary of the NOTSOFAR-1 Challenge: Highlights and Learnings
Figure 3 for Summary of the NOTSOFAR-1 Challenge: Highlights and Learnings
Figure 4 for Summary of the NOTSOFAR-1 Challenge: Highlights and Learnings
Viaarxiv icon

Imperceptible Adversarial Attacks on Point Clouds Guided by Point-to-Surface Field

Add code
Dec 26, 2024
Viaarxiv icon

Multi-Dimensional Insights: Benchmarking Real-World Personalization in Large Multimodal Models

Add code
Dec 17, 2024
Figure 1 for Multi-Dimensional Insights: Benchmarking Real-World Personalization in Large Multimodal Models
Figure 2 for Multi-Dimensional Insights: Benchmarking Real-World Personalization in Large Multimodal Models
Figure 3 for Multi-Dimensional Insights: Benchmarking Real-World Personalization in Large Multimodal Models
Figure 4 for Multi-Dimensional Insights: Benchmarking Real-World Personalization in Large Multimodal Models
Viaarxiv icon

Isochrony-Controlled Speech-to-Text Translation: A study on translating from Sino-Tibetan to Indo-European Languages

Add code
Nov 11, 2024
Figure 1 for Isochrony-Controlled Speech-to-Text Translation: A study on translating from Sino-Tibetan to Indo-European Languages
Figure 2 for Isochrony-Controlled Speech-to-Text Translation: A study on translating from Sino-Tibetan to Indo-European Languages
Figure 3 for Isochrony-Controlled Speech-to-Text Translation: A study on translating from Sino-Tibetan to Indo-European Languages
Figure 4 for Isochrony-Controlled Speech-to-Text Translation: A study on translating from Sino-Tibetan to Indo-European Languages
Viaarxiv icon

Investigating Neural Audio Codecs for Speech Language Model-Based Speech Generation

Add code
Sep 06, 2024
Figure 1 for Investigating Neural Audio Codecs for Speech Language Model-Based Speech Generation
Figure 2 for Investigating Neural Audio Codecs for Speech Language Model-Based Speech Generation
Figure 3 for Investigating Neural Audio Codecs for Speech Language Model-Based Speech Generation
Figure 4 for Investigating Neural Audio Codecs for Speech Language Model-Based Speech Generation
Viaarxiv icon

Exploring Robust Face-Voice Matching in Multilingual Environments

Add code
Jul 29, 2024
Viaarxiv icon