Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Title:What are the limits of cross-lingual dense passage retrieval for low-resource languages?

Aug 21, 2024

Jie Wu, Zhaochun Ren, Suzan Verberne

Figure 1 for What are the limits of cross-lingual dense passage retrieval for low-resource languages?

Figure 2 for What are the limits of cross-lingual dense passage retrieval for low-resource languages?

Figure 3 for What are the limits of cross-lingual dense passage retrieval for low-resource languages?

Figure 4 for What are the limits of cross-lingual dense passage retrieval for low-resource languages?

Share this with someone who'll enjoy it:

Abstract:In this paper, we analyze the capabilities of the multi-lingual Dense Passage Retriever (mDPR) for extremely low-resource languages. In the Cross-lingual Open-Retrieval Answer Generation (CORA) pipeline, mDPR achieves success on multilingual open QA benchmarks across 26 languages, of which 9 were unseen during training. These results are promising for Question Answering (QA) for low-resource languages. We focus on two extremely low-resource languages for which mDPR performs poorly: Amharic and Khmer. We collect and curate datasets to train mDPR models using Translation Language Modeling (TLM) and question--passage alignment. We also investigate the effect of our extension on the language distribution in the retrieval results. Our results on the MKQA and AmQA datasets show that language alignment brings improvements to mDPR for the low-resource languages, but the improvements are modest and the results remain low. We conclude that fulfilling CORA's promise to enable multilingual open QA in extremely low-resource settings is challenging because the model, the data, and the evaluation approach are intertwined. Hence, all three need attention in follow-up work. We release our code for reproducibility and future work: https://anonymous.4open.science/r/Question-Answering-for-Low-Resource-Languages-B13C/

View paper on

Share this with someone who'll enjoy it:

Title:What are the limits of cross-lingual dense passage retrieval for low-resource languages?

Paper and Code