Picture for Shuo Xing

Shuo Xing

Position: Human-Centric AI Requires a Minimum Viable Level of Human Understanding

Add code
Jan 31, 2026
Viaarxiv icon

BibAgent: An Agentic Framework for Traceable Miscitation Detection in Scientific Literature

Add code
Jan 12, 2026
Viaarxiv icon

Demystifying the Visual Quality Paradox in Multimodal Large Language Models

Add code
Jun 18, 2025
Viaarxiv icon

SAFEFLOW: A Principled Protocol for Trustworthy and Transactional Autonomous Agent Systems

Add code
Jun 09, 2025
Viaarxiv icon

mRAG: Elucidating the Design Space of Multi-modal Retrieval-Augmented Generation

Add code
May 29, 2025
Viaarxiv icon

Generative AI for Autonomous Driving: Frontiers and Opportunities

Add code
May 13, 2025
Viaarxiv icon

UniOcc: A Unified Benchmark for Occupancy Forecasting and Prediction in Autonomous Driving

Add code
Mar 31, 2025
Figure 1 for UniOcc: A Unified Benchmark for Occupancy Forecasting and Prediction in Autonomous Driving
Figure 2 for UniOcc: A Unified Benchmark for Occupancy Forecasting and Prediction in Autonomous Driving
Figure 3 for UniOcc: A Unified Benchmark for Occupancy Forecasting and Prediction in Autonomous Driving
Figure 4 for UniOcc: A Unified Benchmark for Occupancy Forecasting and Prediction in Autonomous Driving
Viaarxiv icon

Can Large Vision Language Models Read Maps Like a Human?

Add code
Mar 18, 2025
Figure 1 for Can Large Vision Language Models Read Maps Like a Human?
Figure 2 for Can Large Vision Language Models Read Maps Like a Human?
Figure 3 for Can Large Vision Language Models Read Maps Like a Human?
Figure 4 for Can Large Vision Language Models Read Maps Like a Human?
Viaarxiv icon

DecAlign: Hierarchical Cross-Modal Alignment for Decoupled Multimodal Representation Learning

Add code
Mar 14, 2025
Viaarxiv icon

Re-Align: Aligning Vision Language Models via Retrieval-Augmented Direct Preference Optimization

Add code
Feb 18, 2025
Figure 1 for Re-Align: Aligning Vision Language Models via Retrieval-Augmented Direct Preference Optimization
Figure 2 for Re-Align: Aligning Vision Language Models via Retrieval-Augmented Direct Preference Optimization
Figure 3 for Re-Align: Aligning Vision Language Models via Retrieval-Augmented Direct Preference Optimization
Figure 4 for Re-Align: Aligning Vision Language Models via Retrieval-Augmented Direct Preference Optimization
Viaarxiv icon