Picture for Zhanpeng Chen

Zhanpeng Chen

From Visuals to Vocabulary: Establishing Equivalence Between Image and Text Token Through Autoregressive Pre-training in MLLMs

Add code
Feb 13, 2025
Viaarxiv icon

Advancing General Multimodal Capability of Vision-language Models with Pyramid-descent Visual Position Encoding

Add code
Jan 19, 2025
Viaarxiv icon

MLLM Is a Strong Reranker: Advancing Multimodal Retrieval-augmented Generation via Knowledge-enhanced Reranking and Noise-injected Training

Add code
Jul 31, 2024
Figure 1 for MLLM Is a Strong Reranker: Advancing Multimodal Retrieval-augmented Generation via Knowledge-enhanced Reranking and Noise-injected Training
Figure 2 for MLLM Is a Strong Reranker: Advancing Multimodal Retrieval-augmented Generation via Knowledge-enhanced Reranking and Noise-injected Training
Figure 3 for MLLM Is a Strong Reranker: Advancing Multimodal Retrieval-augmented Generation via Knowledge-enhanced Reranking and Noise-injected Training
Figure 4 for MLLM Is a Strong Reranker: Advancing Multimodal Retrieval-augmented Generation via Knowledge-enhanced Reranking and Noise-injected Training
Viaarxiv icon