Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Krisztian Balog

Second SIGIR Workshop on Simulations for Information Access (Sim4IA 2025)

May 16, 2025

Philipp Schaer, Christin Katharina Kreutz, Krisztian Balog, Timo Breuer, Andreas Konstantin Kruff

Abstract:Simulations in information access (IA) have recently gained interest, as shown by various tutorials and workshops around that topic. Simulations can be key contributors to central IA research and evaluation questions, especially around interactive settings when real users are unavailable, or their participation is impossible due to ethical reasons. In addition, simulations in IA can help contribute to a better understanding of users, reduce complexity of evaluation experiments, and improve reproducibility. Building on recent developments in methods and toolkits, the second iteration of our Sim4IA workshop aims to again bring together researchers and practitioners to form an interactive and engaging forum for discussions on the future perspectives of the field. An additional aim is to plan an upcoming TREC/CLEF campaign.

* Proceedings of the 48th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR '25), July 13--18, 2025, Padua, Italy

Via

Access Paper or Ask Questions

Should We Tailor the Talk? Understanding the Impact of Conversational Styles on Preference Elicitation in Conversational Recommender Systems

Apr 17, 2025

Ivica Kostric, Krisztian Balog, Ujwal Gadiraju

Abstract:Conversational recommender systems (CRSs) provide users with an interactive means to express preferences and receive real-time personalized recommendations. The success of these systems is heavily influenced by the preference elicitation process. While existing research mainly focuses on what questions to ask during preference elicitation, there is a notable gap in understanding what role broader interaction patterns including tone, pacing, and level of proactiveness play in supporting users in completing a given task. This study investigates the impact of different conversational styles on preference elicitation, task performance, and user satisfaction with CRSs. We conducted a controlled experiment in the context of scientific literature recommendation, contrasting two distinct conversational styles, high involvement (fast paced, direct, and proactive with frequent prompts) and high considerateness (polite and accommodating, prioritizing clarity and user comfort) alongside a flexible experimental condition where users could switch between the two. Our results indicate that adapting conversational strategies based on user expertise and allowing flexibility between styles can enhance both user satisfaction and the effectiveness of recommendations in CRSs. Overall, our findings hold important implications for the design of future CRSs.

* To appear in: Proceedings of the 33rd ACM Conference on User Modeling, Adaptation and Personalization (UMAP '25), June 16--19, 2025, New York City, NY, USA

Via

Access Paper or Ask Questions

Rankers, Judges, and Assistants: Towards Understanding the Interplay of LLMs in Information Retrieval Evaluation

Mar 24, 2025

Krisztian Balog, Donald Metzler, Zhen Qin

Abstract:Large language models (LLMs) are increasingly integral to information retrieval (IR), powering ranking, evaluation, and AI-assisted content creation. This widespread adoption necessitates a critical examination of potential biases arising from the interplay between these LLM-based components. This paper synthesizes existing research and presents novel experiment designs that explore how LLM-based rankers and assistants influence LLM-based judges. We provide the first empirical evidence of LLM judges exhibiting significant bias towards LLM-based rankers. Furthermore, we observe limitations in LLM judges' ability to discern subtle system performance differences. Contrary to some previous findings, our preliminary study does not find evidence of bias against AI-generated content. These results highlight the need for a more holistic view of the LLM-driven information ecosystem. To this end, we offer initial guidelines and a research agenda to ensure the reliable use of LLMs in IR evaluation.

Via

Access Paper or Ask Questions

MultiConAD: A Unified Multilingual Conversational Dataset for Early Alzheimer's Detection

Feb 26, 2025

Arezo Shakeri, Mina Farmanbar, Krisztian Balog

Abstract:Dementia is a progressive cognitive syndrome with Alzheimer's disease (AD) as the leading cause. Conversation-based AD detection offers a cost-effective alternative to clinical methods, as language dysfunction is an early biomarker of AD. However, most prior research has framed AD detection as a binary classification problem, limiting the ability to identify Mild Cognitive Impairment (MCI)-a crucial stage for early intervention. Also, studies primarily rely on single-language datasets, mainly in English, restricting cross-language generalizability. To address this gap, we make three key contributions. First, we introduce a novel, multilingual dataset for AD detection by unifying 16 publicly available dementia-related conversational datasets. This corpus spans English, Spanish, Chinese, and Greek and incorporates both audio and text data derived from a variety of cognitive assessment tasks. Second, we perform finer-grained classification, including MCI, and evaluate various classifiers using sparse and dense text representations. Third, we conduct experiments in monolingual and multilingual settings, finding that some languages benefit from multilingual training while others perform better independently. This study highlights the challenges in multilingual AD detection and enables future research on both language-specific approaches and techniques aimed at improving model generalization and robustness.

* 11 pages, 3 Figures

Via

Access Paper or Ask Questions

User Simulation in the Era of Generative AI: User Modeling, Synthetic Data Generation, and System Evaluation

Jan 08, 2025

Krisztian Balog, ChengXiang Zhai

Abstract:User simulation is an emerging interdisciplinary topic with multiple critical applications in the era of Generative AI. It involves creating an intelligent agent that mimics the actions of a human user interacting with an AI system, enabling researchers to model and analyze user behaviour, generate synthetic data for training, and evaluate interactive AI systems in a controlled and reproducible manner. User simulation has profound implications for diverse fields and plays a vital role in the pursuit of Artificial General Intelligence. This paper provides an overview of user simulation, highlighting its key applications, connections to various disciplines, and outlining future research directions to advance this increasingly important technology.

Via

Access Paper or Ask Questions

CRS Arena: Crowdsourced Benchmarking of Conversational Recommender Systems

Dec 13, 2024

Nolwenn Bernard, Hideaki Joko, Faegheh Hasibi, Krisztian Balog

Figure 1 for CRS Arena: Crowdsourced Benchmarking of Conversational Recommender Systems

Figure 2 for CRS Arena: Crowdsourced Benchmarking of Conversational Recommender Systems

Figure 3 for CRS Arena: Crowdsourced Benchmarking of Conversational Recommender Systems

Figure 4 for CRS Arena: Crowdsourced Benchmarking of Conversational Recommender Systems

Abstract:We introduce CRS Arena, a research platform for scalable benchmarking of Conversational Recommender Systems (CRS) based on human feedback. The platform displays pairwise battles between anonymous conversational recommender systems, where users interact with the systems one after the other before declaring either a winner or a draw. CRS Arena collects conversations and user feedback, providing a foundation for reliable evaluation and ranking of CRSs. We conduct experiments with CRS Arena on both open and closed crowdsourcing platforms, confirming that both setups produce highly correlated rankings of CRSs and conversations with similar characteristics. We release CRSArena-Dial, a dataset of 474 conversations and their corresponding user feedback, along with a preliminary ranking of the systems based on the Elo rating system. The platform is accessible at https://iai-group-crsarena.hf.space/.

* Proceedings of the Eighteenth ACM International Conference on Web Search and Data Mining (WSDM '25), March 10--14, 2025, Hannover, Germany

Via

Access Paper or Ask Questions

Can Users Detect Biases or Factual Errors in Generated Responses in Conversational Information-Seeking?

Oct 28, 2024

Weronika Łajewska, Krisztian Balog, Damiano Spina, Johanne Trippas

Figure 1 for Can Users Detect Biases or Factual Errors in Generated Responses in Conversational Information-Seeking?

Figure 2 for Can Users Detect Biases or Factual Errors in Generated Responses in Conversational Information-Seeking?

Figure 3 for Can Users Detect Biases or Factual Errors in Generated Responses in Conversational Information-Seeking?

Figure 4 for Can Users Detect Biases or Factual Errors in Generated Responses in Conversational Information-Seeking?

Abstract:Information-seeking dialogues span a wide range of questions, from simple factoid to complex queries that require exploring multiple facets and viewpoints. When performing exploratory searches in unfamiliar domains, users may lack background knowledge and struggle to verify the system-provided information, making them vulnerable to misinformation. We investigate the limitations of response generation in conversational information-seeking systems, highlighting potential inaccuracies, pitfalls, and biases in the responses. The study addresses the problem of query answerability and the challenge of response incompleteness. Our user studies explore how these issues impact user experience, focusing on users' ability to identify biased, incorrect, or incomplete responses. We design two crowdsourcing tasks to assess user experience with different system response variants, highlighting critical issues to be addressed in future conversational information-seeking research. Our analysis reveals that it is easier for users to detect response incompleteness than query answerability and user satisfaction is mostly associated with response diversity, not factual correctness.

* Extended version of the paper that appeared in the Proceedings of the 2024 Annual International ACM SIGIR Conference on Research and Development in Information Retrieval in the Asia Pacific Region (SIGIR-AP '24)

Via

Access Paper or Ask Questions

Report on the Workshop on Simulations for Information Access (Sim4IA 2024) at SIGIR 2024

Sep 26, 2024

Timo Breuer, Christin Katharina Kreutz, Norbert Fuhr, Krisztian Balog, Philipp Schaer, Nolwenn Bernard, Ingo Frommholz, Marcel Gohsen, Kaixin Ji, Gareth J. F. Jones(+8 more)

Abstract:This paper is a report of the Workshop on Simulations for Information Access (Sim4IA) workshop at SIGIR 2024. The workshop had two keynotes, a panel discussion, nine lightning talks, and two breakout sessions. Key takeaways were user simulation's importance in academia and industry, the possible bridging of online and offline evaluation, and the issues of organizing a companion shared task around user simulations for information access. We report on how we organized the workshop, provide a brief overview of what happened at the workshop, and summarize the main topics and findings of the workshop and future work.

* Preprint of a SIGIR Forum submission for Vol. 58 No. 2 - December 2024

Via

Access Paper or Ask Questions

Towards Realistic Synthetic User-Generated Content: A Scaffolding Approach to Generating Online Discussions

Aug 15, 2024

Krisztian Balog, John Palowitch, Barbara Ikica, Filip Radlinski, Hamidreza Alvari, Mehdi Manshadi

Abstract:The emergence of synthetic data represents a pivotal shift in modern machine learning, offering a solution to satisfy the need for large volumes of data in domains where real data is scarce, highly private, or difficult to obtain. We investigate the feasibility of creating realistic, large-scale synthetic datasets of user-generated content, noting that such content is increasingly prevalent and a source of frequently sought information. Large language models (LLMs) offer a starting point for generating synthetic social media discussion threads, due to their ability to produce diverse responses that typify online interactions. However, as we demonstrate, straightforward application of LLMs yields limited success in capturing the complex structure of online discussions, and standard prompting mechanisms lack sufficient control. We therefore propose a multi-step generation process, predicated on the idea of creating compact representations of discussion threads, referred to as scaffolds. Our framework is generic yet adaptable to the unique characteristics of specific social media platforms. We demonstrate its feasibility using data from two distinct online discussion platforms. To address the fundamental challenge of ensuring the representativeness and realism of synthetic data, we propose a portfolio of evaluation measures to compare various instantiations of our framework.

Via

Access Paper or Ask Questions

A Surprisingly Simple yet Effective Multi-Query Rewriting Method for Conversational Passage Retrieval

Jun 27, 2024

Ivica Kostric, Krisztian Balog

Figure 1 for A Surprisingly Simple yet Effective Multi-Query Rewriting Method for Conversational Passage Retrieval

Abstract:Conversational passage retrieval is challenging as it often requires the resolution of references to previous utterances and needs to deal with the complexities of natural language, such as coreference and ellipsis. To address these challenges, pre-trained sequence-to-sequence neural query rewriters are commonly used to generate a single de-contextualized query based on conversation history. Previous research shows that combining multiple query rewrites for the same user utterance has a positive effect on retrieval performance. We propose the use of a neural query rewriter to generate multiple queries and show how to integrate those queries in the passage retrieval pipeline efficiently. The main strength of our approach lies in its simplicity: it leverages how the beam search algorithm works and can produce multiple query rewrites at no additional cost. Our contributions further include devising ways to utilize multi-query rewrites in both sparse and dense first-pass retrieval. We demonstrate that applying our approach on top of a standard passage retrieval pipeline delivers state-of-the-art performance without sacrificing efficiency.

* Proceedings of the 47th International ACM SIGIR Conference on Research and Development in Information Retrieval

Via

Access Paper or Ask Questions