Picture for Xirong Li

Xirong Li

EI: Early Intervention for Multimodal Imaging based Disease Recognition

Add code
Mar 18, 2026
Viaarxiv icon

SAVE: Speech-Aware Video Representation Learning for Video-Text Retrieval

Add code
Mar 11, 2026
Viaarxiv icon

Cross-modal Fundus Image Registration under Large FoV Disparity

Add code
Dec 14, 2025
Viaarxiv icon

Hybrid-Tower: Fine-grained Pseudo-query Interaction and Generation for Text-to-Video Retrieval

Add code
Sep 05, 2025
Viaarxiv icon

Multi-Object Sketch Animation by Scene Decomposition and Motion Planning

Add code
Mar 25, 2025
Viaarxiv icon

FunBench: Benchmarking Fundus Reading Skills of MLLMs

Add code
Mar 02, 2025
Viaarxiv icon

Convolutional Prompting for Broad-Domain Retinal Vessel Segmentation

Add code
Dec 24, 2024
Viaarxiv icon

Mitigating Hallucination in Multimodal Large Language Model via Hallucination-targeted Direct Preference Optimization

Add code
Nov 15, 2024
Viaarxiv icon

Beyond Coarse-Grained Matching in Video-Text Retrieval

Add code
Oct 17, 2024
Figure 1 for Beyond Coarse-Grained Matching in Video-Text Retrieval
Figure 2 for Beyond Coarse-Grained Matching in Video-Text Retrieval
Figure 3 for Beyond Coarse-Grained Matching in Video-Text Retrieval
Figure 4 for Beyond Coarse-Grained Matching in Video-Text Retrieval
Viaarxiv icon

Magnifier Prompt: Tackling Multimodal Hallucination via Extremely Simple Instructions

Add code
Oct 15, 2024
Figure 1 for Magnifier Prompt: Tackling Multimodal Hallucination via Extremely Simple Instructions
Figure 2 for Magnifier Prompt: Tackling Multimodal Hallucination via Extremely Simple Instructions
Figure 3 for Magnifier Prompt: Tackling Multimodal Hallucination via Extremely Simple Instructions
Figure 4 for Magnifier Prompt: Tackling Multimodal Hallucination via Extremely Simple Instructions
Viaarxiv icon