Picture for Yanpeng Sun

Yanpeng Sun

MATHGLANCE: Multimodal Large Language Models Do Not Know Where to Look in Mathematical Diagrams

Add code
Mar 26, 2025
Viaarxiv icon

Visual Position Prompt for MLLM based Visual Grounding

Add code
Mar 19, 2025
Viaarxiv icon

Open Eyes, Then Reason: Fine-grained Visual Mathematical Understanding in MLLMs

Add code
Jan 11, 2025
Viaarxiv icon

Descriptive Caption Enhancement with Visual Specialists for Multimodal Perception

Add code
Dec 18, 2024
Figure 1 for Descriptive Caption Enhancement with Visual Specialists for Multimodal Perception
Figure 2 for Descriptive Caption Enhancement with Visual Specialists for Multimodal Perception
Figure 3 for Descriptive Caption Enhancement with Visual Specialists for Multimodal Perception
Figure 4 for Descriptive Caption Enhancement with Visual Specialists for Multimodal Perception
Viaarxiv icon

Continual SFT Matches Multimodal RLHF with Negative Supervision

Add code
Nov 22, 2024
Figure 1 for Continual SFT Matches Multimodal RLHF with Negative Supervision
Figure 2 for Continual SFT Matches Multimodal RLHF with Negative Supervision
Figure 3 for Continual SFT Matches Multimodal RLHF with Negative Supervision
Figure 4 for Continual SFT Matches Multimodal RLHF with Negative Supervision
Viaarxiv icon

Improving Multi-modal Large Language Model through Boosting Vision Capabilities

Add code
Oct 17, 2024
Figure 1 for Improving Multi-modal Large Language Model through Boosting Vision Capabilities
Figure 2 for Improving Multi-modal Large Language Model through Boosting Vision Capabilities
Figure 3 for Improving Multi-modal Large Language Model through Boosting Vision Capabilities
Figure 4 for Improving Multi-modal Large Language Model through Boosting Vision Capabilities
Viaarxiv icon

CSGO: Content-Style Composition in Text-to-Image Generation

Add code
Sep 04, 2024
Figure 1 for CSGO: Content-Style Composition in Text-to-Image Generation
Figure 2 for CSGO: Content-Style Composition in Text-to-Image Generation
Figure 3 for CSGO: Content-Style Composition in Text-to-Image Generation
Figure 4 for CSGO: Content-Style Composition in Text-to-Image Generation
Viaarxiv icon

VRP-SAM: SAM with Visual Reference Prompt

Add code
Feb 27, 2024
Figure 1 for VRP-SAM: SAM with Visual Reference Prompt
Figure 2 for VRP-SAM: SAM with Visual Reference Prompt
Figure 3 for VRP-SAM: SAM with Visual Reference Prompt
Figure 4 for VRP-SAM: SAM with Visual Reference Prompt
Viaarxiv icon

Exploring Effective Factors for Improving Visual In-Context Learning

Add code
Apr 10, 2023
Viaarxiv icon

Self-Supervised Guided Segmentation Framework for Unsupervised Anomaly Detection

Add code
Sep 26, 2022
Figure 1 for Self-Supervised Guided Segmentation Framework for Unsupervised Anomaly Detection
Figure 2 for Self-Supervised Guided Segmentation Framework for Unsupervised Anomaly Detection
Figure 3 for Self-Supervised Guided Segmentation Framework for Unsupervised Anomaly Detection
Figure 4 for Self-Supervised Guided Segmentation Framework for Unsupervised Anomaly Detection
Viaarxiv icon