Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Oren Kurland

On the Merits of LLM-Based Corpus Enrichment

Jun 06, 2025

Gal Zur, Tommy Mordo, Moshe Tennenholtz, Oren Kurland

Abstract:Generative AI (genAI) technologies -- specifically, large language models (LLMs) -- and search have evolving relations. We argue for a novel perspective: using genAI to enrich a document corpus so as to improve query-based retrieval effectiveness. The enrichment is based on modifying existing documents or generating new ones. As an empirical proof of concept, we use LLMs to generate documents relevant to a topic which are more retrievable than existing ones. In addition, we demonstrate the potential merits of using corpus enrichment for retrieval augmented generation (RAG) and answer attribution in question answering.

Via

Access Paper or Ask Questions

A Multi-Agent Perspective on Modern Information Retrieval

Feb 20, 2025

Haya Nachimovsky, Moshe Tennenholtz, Oren Kurland

Abstract:The rise of large language models (LLMs) has introduced a new era in information retrieval (IR), where queries and documents that were once assumed to be generated exclusively by humans can now also be created by automated agents. These agents can formulate queries, generate documents, and perform ranking. This shift challenges some long-standing IR paradigms and calls for a reassessment of both theoretical frameworks and practical methodologies. We advocate for a multi-agent perspective to better capture the complex interactions between query agents, document agents, and ranker agents. Through empirical exploration of various multi-agent retrieval settings, we reveal the significant impact of these interactions on system performance. Our findings underscore the need to revisit classical IR paradigms and develop new frameworks for more effective modeling and evaluation of modern retrieval systems.

Via

Access Paper or Ask Questions

Prompt-Based Document Modifications In Ranking Competitions

Feb 11, 2025

Niv Bardas, Tommy Mordo, Oren Kurland, Moshe Tennenholtz, Gal Zur

Abstract:We study prompting-based approaches with Large Language Models (LLMs) for modifying documents so as to promote their ranking in a competitive search setting. Our methods are inspired by prior work on leveraging LLMs as rankers. We evaluate our approach by deploying it as a bot in previous ranking competitions and in competitions we organized. Our findings demonstrate that our approach effectively improves document ranking while preserving high levels of faithfulness to the original content and maintaining overall document quality.

Via

Access Paper or Ask Questions

Search results diversification in competitive search

Jan 24, 2025

Tommy Mordo, Itamar Reinman, Moshe Tennenholtz, Oren Kurland

Abstract:In Web retrieval, there are many cases of competition between authors of Web documents: their incentive is to have their documents highly ranked for queries of interest. As such, the Web is a prominent example of a competitive search setting. Past work on competitive search focused on ranking functions based solely on relevance estimation. We study ranking functions that integrate a results-diversification aspect. We show that the competitive search setting with diversity-based ranking has an equilibrium. Furthermore, we theoretically and empirically show that the phenomenon of authors mimicking content in documents highly ranked in the past, which was demonstrated in previous work, is mitigated when search results diversification is applied.

Via

Access Paper or Ask Questions

Competitive Retrieval: Going Beyond the Single Query

Apr 14, 2024

Haya Nachimovsky, Moshe Tennenholtz, Fiana Raiber, Oren Kurland

Abstract:Previous work on the competitive retrieval setting focused on a single-query setting: document authors manipulate their documents so as to improve their future ranking for a given query. We study a competitive setting where authors opt to improve their document's ranking for multiple queries. We use game theoretic analysis to prove that equilibrium does not necessarily exist. We then empirically show that it is more difficult for authors to improve their documents' rankings for multiple queries with a neural ranker than with a state-of-the-art feature-based ranker. We also present an effective approach for predicting the document most highly ranked in the next induced ranking.

Via

Access Paper or Ask Questions

A Dataset for Sentence Retrieval for Open-Ended Dialogues

May 24, 2022

Itay Harel, Hagai Taitelbaum, Idan Szpektor, Oren Kurland

Figure 1 for A Dataset for Sentence Retrieval for Open-Ended Dialogues

Figure 2 for A Dataset for Sentence Retrieval for Open-Ended Dialogues

Figure 3 for A Dataset for Sentence Retrieval for Open-Ended Dialogues

Figure 4 for A Dataset for Sentence Retrieval for Open-Ended Dialogues

Abstract:We address the task of sentence retrieval for open-ended dialogues. The goal is to retrieve sentences from a document corpus that contain information useful for generating the next turn in a given dialogue. Prior work on dialogue-based retrieval focused on specific types of dialogues: either conversational QA or conversational search. To address a broader scope of this task where any type of dialogue can be used, we constructed a dataset that includes open-ended dialogues from Reddit, candidate sentences from Wikipedia for each dialogue and human annotations for the sentences. We report the performance of several retrieval baselines, including neural retrieval models, over the dataset. To adapt neural models to the types of dialogues in the dataset, we explored an approach to induce a large-scale weakly supervised training data from Reddit. Using this training set significantly improved the performance over training on the MS MARCO dataset.

Via

Access Paper or Ask Questions

Driving the Herd: Search Engines as Content Influencers

Oct 21, 2021

Gregory Goren, Oren Kurland, Moshe Tennenholtz, Fiana Raiber

Figure 1 for Driving the Herd: Search Engines as Content Influencers

Figure 2 for Driving the Herd: Search Engines as Content Influencers

Figure 3 for Driving the Herd: Search Engines as Content Influencers

Figure 4 for Driving the Herd: Search Engines as Content Influencers

Abstract:In competitive search settings such as the Web, many documents' authors (publishers) opt to have their documents highly ranked for some queries. To this end, they modify the documents - specifically, their content - in response to induced rankings. Thus, the search engine affects the content in the corpus via its ranking decisions. We present a first study of the ability of search engines to drive pre-defined, targeted, content effects in the corpus using simple techniques. The first is based on the herding phenomenon - a celebrated result from the economics literature - and the second is based on biasing the relevance ranking function. The types of content effects we study are either topical or touch on specific document properties - length and inclusion of query terms. Analysis of ranking competitions we organized between incentivized publishers shows that the types of content effects we target can indeed be attained by applying our suggested techniques. These findings have important implications with regard to the role of search engines in shaping the corpus.

Via

Access Paper or Ask Questions

Respect My Authority! HITS Without Hyperlinks, Utilizing Cluster-Based Language Models

Apr 22, 2008

Oren Kurland, Lillian Lee

Figure 1 for Respect My Authority! HITS Without Hyperlinks, Utilizing Cluster-Based Language Models

Figure 2 for Respect My Authority! HITS Without Hyperlinks, Utilizing Cluster-Based Language Models

Figure 3 for Respect My Authority! HITS Without Hyperlinks, Utilizing Cluster-Based Language Models

Figure 4 for Respect My Authority! HITS Without Hyperlinks, Utilizing Cluster-Based Language Models

Abstract:We present an approach to improving the precision of an initial document ranking wherein we utilize cluster information within a graph-based framework. The main idea is to perform re-ranking based on centrality within bipartite graphs of documents (on one side) and clusters (on the other side), on the premise that these are mutually reinforcing entities. Links between entities are created via consideration of language models induced from them. We find that our cluster-document graphs give rise to much better retrieval performance than previously proposed document-only graphs do. For example, authority-based re-ranking of documents via a HITS-style cluster-based approach outperforms a previously-proposed PageRank-inspired algorithm applied to solely-document graphs. Moreover, we also show that computing authority scores for clusters constitutes an effective method for identifying clusters containing a large percentage of relevant documents.

* Proceedings of SIGIR 2006, pp 83--90

Via

Access Paper or Ask Questions

Better than the real thing? Iterative pseudo-query processing using cluster-based language models

Jan 11, 2006

Oren Kurland, Lillian Lee, Carmel Domshlak

Figure 1 for Better than the real thing? Iterative pseudo-query processing using cluster-based language models

Figure 2 for Better than the real thing? Iterative pseudo-query processing using cluster-based language models

Figure 3 for Better than the real thing? Iterative pseudo-query processing using cluster-based language models

Figure 4 for Better than the real thing? Iterative pseudo-query processing using cluster-based language models

Abstract:We present a novel approach to pseudo-feedback-based ad hoc retrieval that uses language models induced from both documents and clusters. First, we treat the pseudo-feedback documents produced in response to the original query as a set of pseudo-queries that themselves can serve as input to the retrieval process. Observing that the documents returned in response to the pseudo-queries can then act as pseudo-queries for subsequent rounds, we arrive at a formulation of pseudo-query-based retrieval as an iterative process. Experiments show that several concrete instantiations of this idea, when applied in conjunction with techniques designed to heighten precision, yield performance results rivaling those of a number of previously-proposed algorithms, including the standard language-modeling approach. The use of cluster-based language models is a key contributing factor to our algorithms' success.

* Proceedings of SIGIR 2005, pp. 19--26

Via

Access Paper or Ask Questions

PageRank without hyperlinks: Structural re-ranking using links induced by language models

Jan 11, 2006

Oren Kurland, Lillian Lee

Figure 1 for PageRank without hyperlinks: Structural re-ranking using links induced by language models

Figure 2 for PageRank without hyperlinks: Structural re-ranking using links induced by language models

Figure 3 for PageRank without hyperlinks: Structural re-ranking using links induced by language models

Abstract:Inspired by the PageRank and HITS (hubs and authorities) algorithms for Web search, we propose a structural re-ranking approach to ad hoc information retrieval: we reorder the documents in an initially retrieved set by exploiting asymmetric relationships between them. Specifically, we consider generation links, which indicate that the language model induced from one document assigns high probability to the text of another; in doing so, we take care to prevent bias against long documents. We study a number of re-ranking criteria based on measures of centrality in the graphs formed by generation links, and show that integrating centrality into standard language-model-based retrieval is quite effective at improving precision at top ranks.

* Proceedings of SIGIR 2005, pp. 306--313

Via

Access Paper or Ask Questions