Picture for Shi-Xiong Zhang

Shi-Xiong Zhang

WorldCuisines: A Massive-Scale Benchmark for Multilingual and Multicultural Visual Question Answering on Global Cuisines

Add code
Oct 16, 2024
Figure 1 for WorldCuisines: A Massive-Scale Benchmark for Multilingual and Multicultural Visual Question Answering on Global Cuisines
Figure 2 for WorldCuisines: A Massive-Scale Benchmark for Multilingual and Multicultural Visual Question Answering on Global Cuisines
Figure 3 for WorldCuisines: A Massive-Scale Benchmark for Multilingual and Multicultural Visual Question Answering on Global Cuisines
Figure 4 for WorldCuisines: A Massive-Scale Benchmark for Multilingual and Multicultural Visual Question Answering on Global Cuisines
Viaarxiv icon

RainbowPO: A Unified Framework for Combining Improvements in Preference Optimization

Add code
Oct 05, 2024
Viaarxiv icon

Preference Tuning with Human Feedback on Language, Speech, and Vision Tasks: A Survey

Add code
Sep 17, 2024
Viaarxiv icon

Comparing Discrete and Continuous Space LLMs for Speech Recognition

Add code
Sep 01, 2024
Figure 1 for Comparing Discrete and Continuous Space LLMs for Speech Recognition
Figure 2 for Comparing Discrete and Continuous Space LLMs for Speech Recognition
Figure 3 for Comparing Discrete and Continuous Space LLMs for Speech Recognition
Figure 4 for Comparing Discrete and Continuous Space LLMs for Speech Recognition
Viaarxiv icon

LibriheavyMix: A 20,000-Hour Dataset for Single-Channel Reverberant Multi-Talker Speech Separation, ASR and Speaker Diarization

Add code
Sep 01, 2024
Figure 1 for LibriheavyMix: A 20,000-Hour Dataset for Single-Channel Reverberant Multi-Talker Speech Separation, ASR and Speaker Diarization
Figure 2 for LibriheavyMix: A 20,000-Hour Dataset for Single-Channel Reverberant Multi-Talker Speech Separation, ASR and Speaker Diarization
Figure 3 for LibriheavyMix: A 20,000-Hour Dataset for Single-Channel Reverberant Multi-Talker Speech Separation, ASR and Speaker Diarization
Figure 4 for LibriheavyMix: A 20,000-Hour Dataset for Single-Channel Reverberant Multi-Talker Speech Separation, ASR and Speaker Diarization
Viaarxiv icon

Advancing Multi-talker ASR Performance with Large Language Models

Add code
Aug 30, 2024
Viaarxiv icon

Multi-Channel Multi-Speaker ASR Using Target Speaker's Solo Segment

Add code
Jun 17, 2024
Viaarxiv icon

RIR-SF: Room Impulse Response Based Spatial Feature for Multi-channel Multi-talker ASR

Add code
Oct 31, 2023
Viaarxiv icon

UniX-Encoder: A Universal $X$-Channel Speech Encoder for Ad-Hoc Microphone Array Speech Processing

Add code
Oct 25, 2023
Viaarxiv icon

M3-AUDIODEC: Multi-channel multi-speaker multi-spatial audio codec

Add code
Sep 23, 2023
Viaarxiv icon