Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Mete Sertkan

Ranger: A Toolkit for Effect-Size Based Multi-Task Evaluation

May 24, 2023

Mete Sertkan, Sophia Althammer, Sebastian Hofstätter

Figure 1 for Ranger: A Toolkit for Effect-Size Based Multi-Task Evaluation

Figure 2 for Ranger: A Toolkit for Effect-Size Based Multi-Task Evaluation

Figure 3 for Ranger: A Toolkit for Effect-Size Based Multi-Task Evaluation

Abstract:In this paper, we introduce Ranger - a toolkit to facilitate the easy use of effect-size-based meta-analysis for multi-task evaluation in NLP and IR. We observed that our communities often face the challenge of aggregating results over incomparable metrics and scenarios, which makes conclusions and take-away messages less reliable. With Ranger, we aim to address this issue by providing a task-agnostic toolkit that combines the effect of a treatment on multiple tasks into one statistical evaluation, allowing for comparison of metrics and computation of an overall summary effect. Our toolkit produces publication-ready forest plots that enable clear communication of evaluation results over multiple tasks. Our goal with the ready-to-use Ranger toolkit is to promote robust, effect-size-based evaluation and improve evaluation standards in the community. We provide two case studies for common IR and NLP settings to highlight Ranger's benefits.

* Accepted at ACL 2023 (System Demonstrations)

Via

Access Paper or Ask Questions

The Role of Bias in News Recommendation in the Perception of the Covid-19 Pandemic

Sep 15, 2022

Thomas Elmar Kolb, Irina Nalis, Mete Sertkan, Julia Neidhardt

Figure 1 for The Role of Bias in News Recommendation in the Perception of the Covid-19 Pandemic

Figure 2 for The Role of Bias in News Recommendation in the Perception of the Covid-19 Pandemic

Figure 3 for The Role of Bias in News Recommendation in the Perception of the Covid-19 Pandemic

Figure 4 for The Role of Bias in News Recommendation in the Perception of the Covid-19 Pandemic

Abstract:News recommender systems (NRs) have been shown to shape public discourse and to enforce behaviors that have a critical, oftentimes detrimental effect on democracies. Earlier research on the impact of media bias has revealed their strong impact on opinions and preferences. Responsible NRs are supposed to have depolarizing capacities, once they go beyond accuracy measures. We performed sequence prediction by using the BERT4Rec algorithm to investigate the interplay of news of coverage and user behavior. Based on live data and training of a large data set from one news outlet "event bursts", "rally around the flag" effect and "filter bubbles" were investigated in our interdisciplinary approach between data science and psychology. Potentials for fair NRs that go beyond accuracy measures are outlined via training of the models with a large data set of articles, keywords, and user behavior. The development of the news coverage and user behavior of the COVID-19 pandemic from primarily medical to broader political content and debates was traced. Our study provides first insights for future development of responsible news recommendation that acknowledges user preferences while stimulating diversity and accountability instead of accuracy, only.

* Accepted for presentation at the 5th FAccTRec Workshop on Responsible Recommendation (FAccTRec '22). Revised based on the reviewers' feedback

Via

Access Paper or Ask Questions

Introducing Neural Bag of Whole-Words with ColBERTer: Contextualized Late Interactions using Enhanced Reduction

Mar 24, 2022

Sebastian Hofstätter, Omar Khattab, Sophia Althammer, Mete Sertkan, Allan Hanbury

Figure 1 for Introducing Neural Bag of Whole-Words with ColBERTer: Contextualized Late Interactions using Enhanced Reduction

Figure 2 for Introducing Neural Bag of Whole-Words with ColBERTer: Contextualized Late Interactions using Enhanced Reduction

Figure 3 for Introducing Neural Bag of Whole-Words with ColBERTer: Contextualized Late Interactions using Enhanced Reduction

Figure 4 for Introducing Neural Bag of Whole-Words with ColBERTer: Contextualized Late Interactions using Enhanced Reduction

Abstract:Recent progress in neural information retrieval has demonstrated large gains in effectiveness, while often sacrificing the efficiency and interpretability of the neural model compared to classical approaches. This paper proposes ColBERTer, a neural retrieval model using contextualized late interaction (ColBERT) with enhanced reduction. Along the effectiveness Pareto frontier, ColBERTer's reductions dramatically lower ColBERT's storage requirements while simultaneously improving the interpretability of its token-matching scores. To this end, ColBERTer fuses single-vector retrieval, multi-vector refinement, and optional lexical matching components into one model. For its multi-vector component, ColBERTer reduces the number of stored vectors per document by learning unique whole-word representations for the terms in each document and learning to identify and remove word representations that are not essential to effective scoring. We employ an explicit multi-task, multi-stage training to facilitate using very small vector dimensions. Results on the MS MARCO and TREC-DL collection show that ColBERTer can reduce the storage footprint by up to 2.5x, while maintaining effectiveness. With just one dimension per token in its smallest setting, ColBERTer achieves index storage parity with the plaintext size, with very strong effectiveness results. Finally, we demonstrate ColBERTer's robustness on seven high-quality out-of-domain collections, yielding statistically significant gains over traditional retrieval baselines.

Via

Access Paper or Ask Questions

PARM: A Paragraph Aggregation Retrieval Model for Dense Document-to-Document Retrieval

Jan 05, 2022

Sophia Althammer, Sebastian Hofstätter, Mete Sertkan, Suzan Verberne, Allan Hanbury

Figure 1 for PARM: A Paragraph Aggregation Retrieval Model for Dense Document-to-Document Retrieval

Figure 2 for PARM: A Paragraph Aggregation Retrieval Model for Dense Document-to-Document Retrieval

Figure 3 for PARM: A Paragraph Aggregation Retrieval Model for Dense Document-to-Document Retrieval

Figure 4 for PARM: A Paragraph Aggregation Retrieval Model for Dense Document-to-Document Retrieval

Abstract:Dense passage retrieval (DPR) models show great effectiveness gains in first stage retrieval for the web domain. However in the web domain we are in a setting with large amounts of training data and a query-to-passage or a query-to-document retrieval task. We investigate in this paper dense document-to-document retrieval with limited labelled target data for training, in particular legal case retrieval. In order to use DPR models for document-to-document retrieval, we propose a Paragraph Aggregation Retrieval Model (PARM) which liberates DPR models from their limited input length. PARM retrieves documents on the paragraph-level: for each query paragraph, relevant documents are retrieved based on their paragraphs. Then the relevant results per query paragraph are aggregated into one ranked list for the whole query document. For the aggregation we propose vector-based aggregation with reciprocal rank fusion (VRRF) weighting, which combines the advantages of rank-based aggregation and topical aggregation based on the dense embeddings. Experimental results show that VRRF outperforms rank-based aggregation strategies for dense document-to-document retrieval with PARM. We compare PARM to document-level retrieval and demonstrate higher retrieval effectiveness of PARM for lexical and dense first-stage retrieval on two different legal case retrieval collections. We investigate how to train the dense retrieval model for PARM on limited target data with labels on the paragraph or the document-level. In addition, we analyze the differences of the retrieved results of lexical and dense retrieval with PARM.

* Accepted at ECIR 2022

Via

Access Paper or Ask Questions

Establishing Strong Baselines for TripClick Health Retrieval

Jan 02, 2022

Sebastian Hofstätter, Sophia Althammer, Mete Sertkan, Allan Hanbury

Figure 1 for Establishing Strong Baselines for TripClick Health Retrieval

Figure 2 for Establishing Strong Baselines for TripClick Health Retrieval

Abstract:We present strong Transformer-based re-ranking and dense retrieval baselines for the recently released TripClick health ad-hoc retrieval collection. We improve the - originally too noisy - training data with a simple negative sampling policy. We achieve large gains over BM25 in the re-ranking task of TripClick, which were not achieved with the original baselines. Furthermore, we study the impact of different domain-specific pre-trained models on TripClick. Finally, we show that dense retrieval outperforms BM25 by considerable margins, even with simple training procedures.

* Accepted at ECIR 2022

Via

Access Paper or Ask Questions

A Time-Optimized Content Creation Workflow for Remote Teaching

Oct 13, 2021

Sebastian Hofstätter, Sophia Althammer, Mete Sertkan, Allan Hanbury

Figure 1 for A Time-Optimized Content Creation Workflow for Remote Teaching

Figure 2 for A Time-Optimized Content Creation Workflow for Remote Teaching

Figure 3 for A Time-Optimized Content Creation Workflow for Remote Teaching

Figure 4 for A Time-Optimized Content Creation Workflow for Remote Teaching

Abstract:We describe our workflow to create an engaging remote learning experience for a university course, while minimizing the post-production time of the educators. We make use of ubiquitous and commonly free services and platforms, so that our workflow is inclusive for all educators and provides polished experiences for students. Our learning materials provide for each lecture: 1) a recorded video, uploaded on YouTube, with exact slide timestamp indices, which enables an enhanced navigation UI; and 2) a high-quality flow-text automated transcript of the narration with proper punctuation and capitalization, improved with a student participation workflow on GitHub. All these results could be created by hand in a time consuming and costly way. However, this would generally exceed the time available for creating course materials. Our main contribution is to automate the transformation and post-production between raw narrated slides and our published materials with a custom toolchain. Furthermore, we describe our complete workflow: from content creation to transformation and distribution. Our students gave us overwhelmingly positive feedback and especially liked our use of ubiquitous platforms. The most used feature was YouTube's chapter UI enabled through our automatically generated timestamps. The majority of students, who started using the transcripts, continued to do so. Every single transcript was corrected by students, with an average word-change of 6%. We conclude with the positive feedback that our enhanced content formats are much appreciated and utilized. Important for educators is how our low overhead production workflow was sustainable throughout a busy semester.

* Accepted at SIGCSE-TS 2022

Via

Access Paper or Ask Questions

Fine-Grained Relevance Annotations for Multi-Task Document Ranking and Question Answering

Aug 12, 2020

Sebastian Hofstätter, Markus Zlabinger, Mete Sertkan, Michael Schröder, Allan Hanbury

Figure 1 for Fine-Grained Relevance Annotations for Multi-Task Document Ranking and Question Answering

Figure 2 for Fine-Grained Relevance Annotations for Multi-Task Document Ranking and Question Answering

Figure 3 for Fine-Grained Relevance Annotations for Multi-Task Document Ranking and Question Answering

Figure 4 for Fine-Grained Relevance Annotations for Multi-Task Document Ranking and Question Answering

Abstract:There are many existing retrieval and question answering datasets. However, most of them either focus on ranked list evaluation or single-candidate question answering. This divide makes it challenging to properly evaluate approaches concerned with ranking documents and providing snippets or answers for a given query. In this work, we present FiRA: a novel dataset of Fine-Grained Relevance Annotations. We extend the ranked retrieval annotations of the Deep Learning track of TREC 2019 with passage and word level graded relevance annotations for all relevant documents. We use our newly created data to study the distribution of relevance in long documents, as well as the attention of annotators to specific positions of the text. As an example, we evaluate the recently introduced TKL document ranking model. We find that although TKL exhibits state-of-the-art retrieval results for long documents, it misses many relevant passages.

* Accepted at CIKM 2020 (Resource Track)

Via

Access Paper or Ask Questions