Picture for Yang Xiang

Yang Xiang

TEn-CATS: Text-Enriched Audio-Visual Video Parsing with Multi-Scale Category-Aware Temporal Graph

Add code
Sep 04, 2025
Viaarxiv icon

Uncertainty-Aware Semantic Decoding for LLM-Based Sequential Recommendation

Add code
Aug 10, 2025
Viaarxiv icon

CCFQA: A Benchmark for Cross-Lingual and Cross-Modal Speech and Text Factuality Evaluation

Add code
Aug 10, 2025
Viaarxiv icon

MVAR: MultiVariate AutoRegressive Air Pollutants Forecasting Model

Add code
Jul 16, 2025
Viaarxiv icon

Quantize More, Lose Less: Autoregressive Generation from Residually Quantized Speech Representations

Add code
Jul 16, 2025
Viaarxiv icon

KaLM-Embedding-V2: Superior Training Techniques and Data Inspire A Versatile Embedding Model

Add code
Jun 26, 2025
Viaarxiv icon

Evaluating and Steering Modality Preferences in Multimodal Large Language Model

Add code
May 27, 2025
Viaarxiv icon

XBOUND: Exploring the Capability Boundaries of Device-Control Agents through Trajectory Tree Exploration

Add code
May 27, 2025
Viaarxiv icon

Enhancing Generalization of Speech Large Language Models with Multi-Task Behavior Imitation and Speech-Text Interleaving

Add code
May 24, 2025
Viaarxiv icon

A Semantic Information-based Hierarchical Speech Enhancement Method Using Factorized Codec and Diffusion Model

Add code
May 20, 2025
Viaarxiv icon