Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Jia-Huei Ju

Reproducibility, Replicability, and Insights into Visual Document Retrieval with Late Interaction

May 12, 2025

Jingfen Qiao, Jia-Huei Ju, Xinyu Ma, Evangelos Kanoulas, Andrew Yates

Abstract:Visual Document Retrieval (VDR) is an emerging research area that focuses on encoding and retrieving document images directly, bypassing the dependence on Optical Character Recognition (OCR) for document search. A recent advance in VDR was introduced by ColPali, which significantly improved retrieval effectiveness through a late interaction mechanism. ColPali's approach demonstrated substantial performance gains over existing baselines that do not use late interaction on an established benchmark. In this study, we investigate the reproducibility and replicability of VDR methods with and without late interaction mechanisms by systematically evaluating their performance across multiple pre-trained vision-language models. Our findings confirm that late interaction yields considerable improvements in retrieval effectiveness; however, it also introduces computational inefficiencies during inference. Additionally, we examine the adaptability of VDR models to textual inputs and assess their robustness across text-intensive datasets within the proposed benchmark, particularly when scaling the indexing mechanism. Furthermore, our research investigates the specific contributions of late interaction by looking into query-patch matching in the context of visual document retrieval. We find that although query tokens cannot explicitly match image patches as in the text retrieval scenario, they tend to match the patch contains visually similar tokens or their surrounding patches.

Via

Access Paper or Ask Questions

On the Reproducibility of Learned Sparse Retrieval Adaptations for Long Documents

Mar 31, 2025

Emmanouil Georgios Lionis, Jia-Huei Ju

Abstract:Document retrieval is one of the most challenging tasks in Information Retrieval. It requires handling longer contexts, often resulting in higher query latency and increased computational overhead. Recently, Learned Sparse Retrieval (LSR) has emerged as a promising approach to address these challenges. Some have proposed adapting the LSR approach to longer documents by aggregating segmented document using different post-hoc methods, including n-grams and proximity scores, adjusting representations, and learning to ensemble all signals. In this study, we aim to reproduce and examine the mechanisms of adapting LSR for long documents. Our reproducibility experiments confirmed the importance of specific segments, with the first segment consistently dominating document retrieval performance. Furthermore, We re-evaluate recently proposed methods -- ExactSDM and SoftSDM -- across varying document lengths, from short (up to 2 segments) to longer (3+ segments). We also designed multiple analyses to probe the reproduced methods and shed light on the impact of global information on adapting LSR to longer contexts. The complete code and implementation for this project is available at: https://github.com/lionisakis/Reproducibilitiy-lsr-long.

* ECIR 2025, Part IV, LNCS 15575
* This is a preprint of our paper accepted at ECIR 2025

Via

Access Paper or Ask Questions

Improving Conversational Passage Re-ranking with View Ensemble

Apr 26, 2023

Jia-Huei Ju, Sheng-Chieh Lin, Ming-Feng Tsai, Chuan-Ju Wang

Figure 1 for Improving Conversational Passage Re-ranking with View Ensemble

Figure 2 for Improving Conversational Passage Re-ranking with View Ensemble

Figure 3 for Improving Conversational Passage Re-ranking with View Ensemble

Figure 4 for Improving Conversational Passage Re-ranking with View Ensemble

Abstract:This paper presents ConvRerank, a conversational passage re-ranker that employs a newly developed pseudo-labeling approach. Our proposed view-ensemble method enhances the quality of pseudo-labeled data, thus improving the fine-tuning of ConvRerank. Our experimental evaluation on benchmark datasets shows that combining ConvRerank with a conversational dense retriever in a cascaded manner achieves a good balance between effectiveness and efficiency. Compared to baseline methods, our cascaded pipeline demonstrates lower latency and higher top-ranking effectiveness. Furthermore, the in-depth analysis confirms the potential of our approach to improving the effectiveness of conversational search.

* SIGIR 2023

Via

Access Paper or Ask Questions

Text-to-Text Multi-view Learning for Passage Re-ranking

Apr 29, 2021

Jia-Huei Ju, Jheng-Hong Yang, Chuan-Ju Wang

Figure 1 for Text-to-Text Multi-view Learning for Passage Re-ranking

Figure 2 for Text-to-Text Multi-view Learning for Passage Re-ranking

Figure 3 for Text-to-Text Multi-view Learning for Passage Re-ranking

Figure 4 for Text-to-Text Multi-view Learning for Passage Re-ranking

Abstract:Recently, much progress in natural language processing has been driven by deep contextualized representations pretrained on large corpora. Typically, the fine-tuning on these pretrained models for a specific downstream task is based on single-view learning, which is however inadequate as a sentence can be interpreted differently from different perspectives. Therefore, in this work, we propose a text-to-text multi-view learning framework by incorporating an additional view -- the text generation view -- into a typical single-view passage ranking model. Empirically, the proposed approach is of help to the ranking performance compared to its single-view counterpart. Ablation studies are also reported in the paper.

* Accepted as short paper in SIGIR 2021

Via

Access Paper or Ask Questions