Picture for Sibo Song

Sibo Song

Qwen3-VL-Embedding and Qwen3-VL-Reranker: A Unified Framework for State-of-the-Art Multimodal Retrieval and Ranking

Add code
Jan 08, 2026
Viaarxiv icon

Revisiting Multimodal Positional Encoding in Vision-Language Models

Add code
Oct 27, 2025
Viaarxiv icon

Knowing or Guessing? Robust Medical Visual Question Answering via Joint Consistency and Contrastive Learning

Add code
Aug 26, 2025
Viaarxiv icon

CAPO: Reinforcing Consistent Reasoning in Medical Decision-Making

Add code
Jun 15, 2025
Figure 1 for CAPO: Reinforcing Consistent Reasoning in Medical Decision-Making
Figure 2 for CAPO: Reinforcing Consistent Reasoning in Medical Decision-Making
Figure 3 for CAPO: Reinforcing Consistent Reasoning in Medical Decision-Making
Figure 4 for CAPO: Reinforcing Consistent Reasoning in Medical Decision-Making
Viaarxiv icon

OmniV-Med: Scaling Medical Vision-Language Model for Universal Visual Understanding

Add code
Apr 20, 2025
Figure 1 for OmniV-Med: Scaling Medical Vision-Language Model for Universal Visual Understanding
Figure 2 for OmniV-Med: Scaling Medical Vision-Language Model for Universal Visual Understanding
Figure 3 for OmniV-Med: Scaling Medical Vision-Language Model for Universal Visual Understanding
Figure 4 for OmniV-Med: Scaling Medical Vision-Language Model for Universal Visual Understanding
Viaarxiv icon

Generative Compositor for Few-Shot Visual Information Extraction

Add code
Mar 21, 2025
Viaarxiv icon

OmniParser V2: Structured-Points-of-Thought for Unified Visual Text Parsing and Its Generality to Multimodal Large Language Models

Add code
Feb 22, 2025
Viaarxiv icon

Qwen2.5-VL Technical Report

Add code
Feb 19, 2025
Figure 1 for Qwen2.5-VL Technical Report
Figure 2 for Qwen2.5-VL Technical Report
Figure 3 for Qwen2.5-VL Technical Report
Figure 4 for Qwen2.5-VL Technical Report
Viaarxiv icon

OmniParser: A Unified Framework for Text Spotting, Key Information Extraction and Table Recognition

Add code
Mar 28, 2024
Viaarxiv icon

Modeling Entities as Semantic Points for Visual Information Extraction in the Wild

Add code
Mar 29, 2023
Figure 1 for Modeling Entities as Semantic Points for Visual Information Extraction in the Wild
Figure 2 for Modeling Entities as Semantic Points for Visual Information Extraction in the Wild
Figure 3 for Modeling Entities as Semantic Points for Visual Information Extraction in the Wild
Figure 4 for Modeling Entities as Semantic Points for Visual Information Extraction in the Wild
Viaarxiv icon