Picture for Xiaohui Li

Xiaohui Li

Deeper Thought, Weaker Aim: Understanding and Mitigating Perceptual Impairment during Reasoning in Multimodal Large Language Models

Add code
Mar 15, 2026
Viaarxiv icon

Accelerating Masked Image Generation by Learning Latent Controlled Dynamics

Add code
Feb 27, 2026
Viaarxiv icon

UniWeTok: An Unified Binary Tokenizer with Codebook Size $\mathit{2^{128}}$ for Unified Multimodal Large Language Model

Add code
Feb 15, 2026
Viaarxiv icon

Toward Generalizable Deblurring: Leveraging Massive Blur Priors with Linear Attention for Real-World Scenarios

Add code
Jan 10, 2026
Viaarxiv icon

SWE-Lego: Pushing the Limits of Supervised Fine-tuning for Software Issue Resolving

Add code
Jan 07, 2026
Viaarxiv icon

UniPercept: Towards Unified Perceptual-Level Image Understanding across Aesthetics, Quality, Structure, and Texture

Add code
Dec 25, 2025
Figure 1 for UniPercept: Towards Unified Perceptual-Level Image Understanding across Aesthetics, Quality, Structure, and Texture
Figure 2 for UniPercept: Towards Unified Perceptual-Level Image Understanding across Aesthetics, Quality, Structure, and Texture
Figure 3 for UniPercept: Towards Unified Perceptual-Level Image Understanding across Aesthetics, Quality, Structure, and Texture
Figure 4 for UniPercept: Towards Unified Perceptual-Level Image Understanding across Aesthetics, Quality, Structure, and Texture
Viaarxiv icon

MTR-DuplexBench: Towards a Comprehensive Evaluation of Multi-Round Conversations for Full-Duplex Speech Language Models

Add code
Nov 13, 2025
Figure 1 for MTR-DuplexBench: Towards a Comprehensive Evaluation of Multi-Round Conversations for Full-Duplex Speech Language Models
Figure 2 for MTR-DuplexBench: Towards a Comprehensive Evaluation of Multi-Round Conversations for Full-Duplex Speech Language Models
Figure 3 for MTR-DuplexBench: Towards a Comprehensive Evaluation of Multi-Round Conversations for Full-Duplex Speech Language Models
Figure 4 for MTR-DuplexBench: Towards a Comprehensive Evaluation of Multi-Round Conversations for Full-Duplex Speech Language Models
Viaarxiv icon

FlashVSR: Towards Real-Time Diffusion-Based Streaming Video Super-Resolution

Add code
Oct 14, 2025
Figure 1 for FlashVSR: Towards Real-Time Diffusion-Based Streaming Video Super-Resolution
Figure 2 for FlashVSR: Towards Real-Time Diffusion-Based Streaming Video Super-Resolution
Figure 3 for FlashVSR: Towards Real-Time Diffusion-Based Streaming Video Super-Resolution
Figure 4 for FlashVSR: Towards Real-Time Diffusion-Based Streaming Video Super-Resolution
Viaarxiv icon

Think Before You Talk: Enhancing Meaningful Dialogue Generation in Full-Duplex Speech Language Models with Planning-Inspired Text Guidance

Add code
Aug 10, 2025
Figure 1 for Think Before You Talk: Enhancing Meaningful Dialogue Generation in Full-Duplex Speech Language Models with Planning-Inspired Text Guidance
Figure 2 for Think Before You Talk: Enhancing Meaningful Dialogue Generation in Full-Duplex Speech Language Models with Planning-Inspired Text Guidance
Figure 3 for Think Before You Talk: Enhancing Meaningful Dialogue Generation in Full-Duplex Speech Language Models with Planning-Inspired Text Guidance
Figure 4 for Think Before You Talk: Enhancing Meaningful Dialogue Generation in Full-Duplex Speech Language Models with Planning-Inspired Text Guidance
Viaarxiv icon

The Synergy Dilemma of Long-CoT SFT and RL: Investigating Post-Training Techniques for Reasoning VLMs

Add code
Jul 10, 2025
Figure 1 for The Synergy Dilemma of Long-CoT SFT and RL: Investigating Post-Training Techniques for Reasoning VLMs
Figure 2 for The Synergy Dilemma of Long-CoT SFT and RL: Investigating Post-Training Techniques for Reasoning VLMs
Figure 3 for The Synergy Dilemma of Long-CoT SFT and RL: Investigating Post-Training Techniques for Reasoning VLMs
Figure 4 for The Synergy Dilemma of Long-CoT SFT and RL: Investigating Post-Training Techniques for Reasoning VLMs
Viaarxiv icon