Picture for Yuan Wu

Yuan Wu

Senior Member, IEEE

How Robust is OCR-Reasoning? Evaluating OCR-Reasoning Robustness of Vision-Language Models under Visual Perturbations

Add code
Jun 24, 2026
Viaarxiv icon

AuRA: Internalizing Audio Understanding into LLMs as LoRA

Add code
Jun 09, 2026
Viaarxiv icon

Convergence Rates for Neural-Network Estimation with Current-Status Data

Add code
Jun 08, 2026
Viaarxiv icon

VCIFBench: Evaluating Complex Instruction Following for Video Understanding

Add code
Jun 03, 2026
Viaarxiv icon

A Systematic Evaluation of Positional Bias in Multi-Video Summarization with MLLMs

Add code
Jun 03, 2026
Viaarxiv icon

Height-Guided Projection Reparameterization for Camera-LiDAR Occupancy

Add code
May 06, 2026
Viaarxiv icon

AutoFly: Vision-Language-Action Model for UAV Autonomous Navigation in the Wild

Add code
Feb 10, 2026
Viaarxiv icon

AGGC: Adaptive Group Gradient Clipping for Stabilizing Large Language Model Training

Add code
Jan 17, 2026
Viaarxiv icon

Align, Don't Divide: Revisiting the LoRA Architecture in Multi-Task Learning

Add code
Aug 07, 2025
Viaarxiv icon

Can Large Multimodal Models Actively Recognize Faulty Inputs? A Systematic Evaluation Framework of Their Input Scrutiny Ability

Add code
Aug 06, 2025
Figure 1 for Can Large Multimodal Models Actively Recognize Faulty Inputs? A Systematic Evaluation Framework of Their Input Scrutiny Ability
Figure 2 for Can Large Multimodal Models Actively Recognize Faulty Inputs? A Systematic Evaluation Framework of Their Input Scrutiny Ability
Figure 3 for Can Large Multimodal Models Actively Recognize Faulty Inputs? A Systematic Evaluation Framework of Their Input Scrutiny Ability
Figure 4 for Can Large Multimodal Models Actively Recognize Faulty Inputs? A Systematic Evaluation Framework of Their Input Scrutiny Ability
Viaarxiv icon