Picture for Shan Ning

Shan Ning

WikiCLIP: An Efficient Contrastive Baseline for Open-domain Visual Entity Recognition

Add code
Mar 10, 2026
Viaarxiv icon

Wiki-R1: Incentivizing Multimodal Reasoning for Knowledge-based VQA via Data and Sampling Curriculum

Add code
Mar 05, 2026
Viaarxiv icon

DA-DPO: Cost-efficient Difficulty-aware Preference Optimization for Reducing MLLM Hallucinations

Add code
Jan 02, 2026
Viaarxiv icon

Mining Fine-Grained Image-Text Alignment for Zero-Shot Captioning via Text-Only Training

Add code
Jan 04, 2024
Figure 1 for Mining Fine-Grained Image-Text Alignment for Zero-Shot Captioning via Text-Only Training
Figure 2 for Mining Fine-Grained Image-Text Alignment for Zero-Shot Captioning via Text-Only Training
Figure 3 for Mining Fine-Grained Image-Text Alignment for Zero-Shot Captioning via Text-Only Training
Figure 4 for Mining Fine-Grained Image-Text Alignment for Zero-Shot Captioning via Text-Only Training
Viaarxiv icon

HOICLIP: Efficient Knowledge Transfer for HOI Detection with Vision-Language Models

Add code
Mar 29, 2023
Figure 1 for HOICLIP: Efficient Knowledge Transfer for HOI Detection with Vision-Language Models
Figure 2 for HOICLIP: Efficient Knowledge Transfer for HOI Detection with Vision-Language Models
Figure 3 for HOICLIP: Efficient Knowledge Transfer for HOI Detection with Vision-Language Models
Figure 4 for HOICLIP: Efficient Knowledge Transfer for HOI Detection with Vision-Language Models
Viaarxiv icon