Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Stefano Campese

Datasets for Multilingual Answer Sentence Selection

Jun 14, 2024

Matteo Gabburo, Stefano Campese, Federico Agostini, Alessandro Moschitti

Figure 1 for Datasets for Multilingual Answer Sentence Selection

Figure 2 for Datasets for Multilingual Answer Sentence Selection

Figure 3 for Datasets for Multilingual Answer Sentence Selection

Figure 4 for Datasets for Multilingual Answer Sentence Selection

Abstract:Answer Sentence Selection (AS2) is a critical task for designing effective retrieval-based Question Answering (QA) systems. Most advancements in AS2 focus on English due to the scarcity of annotated datasets for other languages. This lack of resources prevents the training of effective AS2 models in different languages, creating a performance gap between QA systems in English and other locales. In this paper, we introduce new high-quality datasets for AS2 in five European languages (French, German, Italian, Portuguese, and Spanish), obtained through supervised Automatic Machine Translation (AMT) of existing English AS2 datasets such as ASNQ, WikiQA, and TREC-QA using a Large Language Model (LLM). We evaluated our approach and the quality of the translated datasets through multiple experiments with different Transformer architectures. The results indicate that our datasets are pivotal in producing robust and powerful multilingual AS2 models, significantly contributing to closing the performance gap between English and other languages.

Via

Access Paper or Ask Questions

QUADRo: Dataset and Models for QUestion-Answer Database Retrieval

Mar 30, 2023

Stefano Campese, Ivano Lauriola, Alessandro Moschitti

Figure 1 for QUADRo: Dataset and Models for QUestion-Answer Database Retrieval

Figure 2 for QUADRo: Dataset and Models for QUestion-Answer Database Retrieval

Figure 3 for QUADRo: Dataset and Models for QUestion-Answer Database Retrieval

Figure 4 for QUADRo: Dataset and Models for QUestion-Answer Database Retrieval

Abstract:An effective paradigm for building Automated Question Answering systems is the re-use of previously answered questions, e.g., for FAQs or forum applications. Given a database (DB) of question/answer (q/a) pairs, it is possible to answer a target question by scanning the DB for similar questions. In this paper, we scale this approach to open domain, making it competitive with other standard methods, e.g., unstructured document or graph based. For this purpose, we (i) build a large scale DB of 6.3M q/a pairs, using public questions, (ii) design a new system based on neural IR and a q/a pair reranker, and (iii) construct training and test data to perform comparative experiments with our models. We demonstrate that Transformer-based models using (q,a) pairs outperform models only based on question representation, for both neural search and reranking. Additionally, we show that our DB-based approach is competitive with Web-based methods, i.e., a QA system built on top the BING search engine, demonstrating the challenge of finding relevant information. Finally, we make our data and models available for future research.

Via

Access Paper or Ask Questions