Picture for Siyin Wang

Siyin Wang

SICL-AT: Another way to adapt Auditory LLM to low-resource task

Add code
Jan 26, 2026
Viaarxiv icon

SRPO: Self-Referential Policy Optimization for Vision-Language-Action Models

Add code
Nov 19, 2025
Viaarxiv icon

Speech-Audio Compositional Attacks on Multimodal LLMs and Their Mitigation with SALMONN-Guard

Add code
Nov 14, 2025
Figure 1 for Speech-Audio Compositional Attacks on Multimodal LLMs and Their Mitigation with SALMONN-Guard
Figure 2 for Speech-Audio Compositional Attacks on Multimodal LLMs and Their Mitigation with SALMONN-Guard
Figure 3 for Speech-Audio Compositional Attacks on Multimodal LLMs and Their Mitigation with SALMONN-Guard
Figure 4 for Speech-Audio Compositional Attacks on Multimodal LLMs and Their Mitigation with SALMONN-Guard
Viaarxiv icon

ConvSearch-R1: Enhancing Query Reformulation for Conversational Search with Reasoning via Reinforcement Learning

Add code
May 21, 2025
Viaarxiv icon

VisuoThink: Empowering LVLM Reasoning with Multimodal Tree Search

Add code
Apr 12, 2025
Viaarxiv icon

QualiSpeech: A Speech Quality Assessment Dataset with Natural Language Reasoning and Descriptions

Add code
Mar 26, 2025
Figure 1 for QualiSpeech: A Speech Quality Assessment Dataset with Natural Language Reasoning and Descriptions
Figure 2 for QualiSpeech: A Speech Quality Assessment Dataset with Natural Language Reasoning and Descriptions
Figure 3 for QualiSpeech: A Speech Quality Assessment Dataset with Natural Language Reasoning and Descriptions
Figure 4 for QualiSpeech: A Speech Quality Assessment Dataset with Natural Language Reasoning and Descriptions
Viaarxiv icon

World Modeling Makes a Better Planner: Dual Preference Optimization for Embodied Task Planning

Add code
Mar 13, 2025
Viaarxiv icon

Audio Large Language Models Can Be Descriptive Speech Quality Evaluators

Add code
Jan 27, 2025
Viaarxiv icon

SALMONN-omni: A Codec-free LLM for Full-duplex Speech Understanding and Generation

Add code
Nov 27, 2024
Figure 1 for SALMONN-omni: A Codec-free LLM for Full-duplex Speech Understanding and Generation
Figure 2 for SALMONN-omni: A Codec-free LLM for Full-duplex Speech Understanding and Generation
Figure 3 for SALMONN-omni: A Codec-free LLM for Full-duplex Speech Understanding and Generation
Figure 4 for SALMONN-omni: A Codec-free LLM for Full-duplex Speech Understanding and Generation
Viaarxiv icon

Enabling Auditory Large Language Models for Automatic Speech Quality Evaluation

Add code
Sep 25, 2024
Figure 1 for Enabling Auditory Large Language Models for Automatic Speech Quality Evaluation
Figure 2 for Enabling Auditory Large Language Models for Automatic Speech Quality Evaluation
Figure 3 for Enabling Auditory Large Language Models for Automatic Speech Quality Evaluation
Figure 4 for Enabling Auditory Large Language Models for Automatic Speech Quality Evaluation
Viaarxiv icon