Image Retrieval


KGMEL: Knowledge Graph-Enhanced Multimodal Entity Linking

Add code
Apr 21, 2025
Viaarxiv icon

Meta-Entity Driven Triplet Mining for Aligning Medical Vision-Language Models

Add code
Apr 22, 2025
Viaarxiv icon

Improving Sound Source Localization with Joint Slot Attention on Image and Audio

Add code
Apr 21, 2025
Viaarxiv icon

A Multimodal Recaptioning Framework to Account for Perceptual Diversity in Multilingual Vision-Language Modeling

Add code
Apr 19, 2025
Viaarxiv icon

REDEditing: Relationship-Driven Precise Backdoor Poisoning on Text-to-Image Diffusion Models

Add code
Apr 20, 2025
Viaarxiv icon

Fashion-RAG: Multimodal Fashion Image Editing via Retrieval-Augmented Generation

Add code
Apr 18, 2025
Viaarxiv icon

SemCORE: A Semantic-Enhanced Generative Cross-Modal Retrieval Framework with MLLMs

Add code
Apr 17, 2025
Viaarxiv icon

Towards Scale-Aware Low-Light Enhancement via Structure-Guided Transformer Design

Add code
Apr 18, 2025
Viaarxiv icon

Perception Encoder: The best visual embeddings are not at the output of the network

Add code
Apr 17, 2025
Viaarxiv icon

TMCIR: Token Merge Benefits Composed Image Retrieval

Add code
Apr 15, 2025
Viaarxiv icon