Picture for Hongsheng Li

Hongsheng Li

MathCanvas: Intrinsic Visual Chain-of-Thought for Multimodal Mathematical Reasoning

Add code
Oct 16, 2025
Viaarxiv icon

SAIL-Embedding Technical Report: Omni-modal Embedding Foundation Model

Add code
Oct 14, 2025
Viaarxiv icon

Factuality Matters: When Image Generation and Editing Meet Structured Visuals

Add code
Oct 06, 2025
Viaarxiv icon

WebGen-Agent: Enhancing Interactive Website Generation with Multi-Level Feedback and Step-Level Reinforcement Learning

Add code
Sep 26, 2025
Viaarxiv icon

VoiceAssistant-Eval: Benchmarking AI Assistants across Listening, Speaking, and Viewing

Add code
Sep 26, 2025
Viaarxiv icon

FLUX-Reason-6M & PRISM-Bench: A Million-Scale Text-to-Image Reasoning Dataset and Comprehensive Benchmark

Add code
Sep 11, 2025
Viaarxiv icon

GLEAM: Learning to Match and Explain in Cross-View Geo-Localization

Add code
Sep 09, 2025
Viaarxiv icon

One Model for All Tasks: Leveraging Efficient World Models in Multi-Task Planning

Add code
Sep 09, 2025
Viaarxiv icon

Alignment with Fill-In-the-Middle for Enhancing Code Generation

Add code
Aug 27, 2025
Viaarxiv icon

Echo-4o: Harnessing the Power of GPT-4o Synthetic Images for Improved Image Generation

Add code
Aug 13, 2025
Viaarxiv icon