Picture for Rongrong Ji

Rongrong Ji

Xiamen University, Peng Cheng Laboratory

CIR-CoT: Towards Interpretable Composed Image Retrieval via End-to-End Chain-of-Thought Reasoning

Add code
Oct 09, 2025
Figure 1 for CIR-CoT: Towards Interpretable Composed Image Retrieval via End-to-End Chain-of-Thought Reasoning
Figure 2 for CIR-CoT: Towards Interpretable Composed Image Retrieval via End-to-End Chain-of-Thought Reasoning
Figure 3 for CIR-CoT: Towards Interpretable Composed Image Retrieval via End-to-End Chain-of-Thought Reasoning
Figure 4 for CIR-CoT: Towards Interpretable Composed Image Retrieval via End-to-End Chain-of-Thought Reasoning
Viaarxiv icon

CCF: A Context Compression Framework for Efficient Long-Sequence Language Modeling

Add code
Sep 11, 2025
Viaarxiv icon

Spotlight Attention: Towards Efficient LLM Generation via Non-linear Hashing-based KV Cache Retrieval

Add code
Aug 27, 2025
Viaarxiv icon

VISA: Group-wise Visual Token Selection and Aggregation via Graph Summarization for Efficient MLLMs Inference

Add code
Aug 25, 2025
Viaarxiv icon

DS$^2$Net: Detail-Semantic Deep Supervision Network for Medical Image Segmentation

Add code
Aug 06, 2025
Viaarxiv icon

MIHBench: Benchmarking and Mitigating Multi-Image Hallucinations in Multimodal Large Language Models

Add code
Aug 01, 2025
Viaarxiv icon

Towards Universal Modal Tracking with Online Dense Temporal Token Learning

Add code
Jul 27, 2025
Figure 1 for Towards Universal Modal Tracking with Online Dense Temporal Token Learning
Figure 2 for Towards Universal Modal Tracking with Online Dense Temporal Token Learning
Figure 3 for Towards Universal Modal Tracking with Online Dense Temporal Token Learning
Figure 4 for Towards Universal Modal Tracking with Online Dense Temporal Token Learning
Viaarxiv icon

GS-Bias: Global-Spatial Bias Learner for Single-Image Test-Time Adaptation of Vision-Language Models

Add code
Jul 16, 2025
Viaarxiv icon

AIGI-Holmes: Towards Explainable and Generalizable AI-Generated Image Detection via Multimodal Large Language Models

Add code
Jul 03, 2025
Viaarxiv icon

DeOcc-1-to-3: 3D De-Occlusion from a Single Image via Self-Supervised Multi-View Diffusion

Add code
Jun 26, 2025
Viaarxiv icon