Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
80 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

A Multi-Agent Perspective on Modern Information Retrieval (2502.14796v1)

Published 20 Feb 2025 in cs.IR

Abstract: The rise of LLMs has introduced a new era in information retrieval (IR), where queries and documents that were once assumed to be generated exclusively by humans can now also be created by automated agents. These agents can formulate queries, generate documents, and perform ranking. This shift challenges some long-standing IR paradigms and calls for a reassessment of both theoretical frameworks and practical methodologies. We advocate for a multi-agent perspective to better capture the complex interactions between query agents, document agents, and ranker agents. Through empirical exploration of various multi-agent retrieval settings, we reveal the significant impact of these interactions on system performance. Our findings underscore the need to revisit classical IR paradigms and develop new frameworks for more effective modeling and evaluation of modern retrieval systems.

The paper posits that the rise of LLMs has ushered in a new era in Information Retrieval (IR), necessitating a shift towards a multi-agent perspective. This perspective aims to capture the complex interactions between query agents, document agents, and ranker agents. The authors advocate for re-evaluating classical IR paradigms and developing new frameworks for more effective modeling and evaluation of modern retrieval systems.

The paper identifies three crucial agent types in modern retrieval settings: query agents, document agents, and ranker agents. It is argued that the mutual effects among these agents necessitate a re-consideration of classical retrieval paradigms and frameworks, with far-reaching implications for ad hoc retrieval evaluation. The paper challenges the generative theory for relevance, which posits that terms in the query and relevant documents are generated by the same LLM. Given that query and document agents may utilize different LLMs, this assumption may no longer hold. This misalignment is conceptually reminiscent of cross-lingual retrieval. The authors also note that LLM-based rankers can be biased toward LLM-generated content. The paper also addresses ranking incentives in competitive search settings, where document authors modify documents to improve their ranking, leading to herding effects and reduced topical diversity. The authors also critique the Cranfield evaluation paradigm, arguing that static collections cannot support the evaluation of corpus effects driven by document agents responding to induced rankings.

The authors advocate for re-considering ad hoc retrieval fundamentals in multi-agent settings. They empirically illustrate the multi-agent retrieval setting using lexical (TF.IDF-based), semantic (embedding-based), and LLM-based approaches to devise document, ranker, and query agents. The empirical findings show that when the query agent and ranker agent are of different types, retrieval effectiveness degrades. Additionally, misalignment between document and ranker agents reduces the ability of the document agent to promote its document in rankings.

The paper explores the role of the ranker agent, whose goal is to induce a ranking in response to a query. The paper suggests that the Probability Ranking Principle (PRP) remains optimal under specific conditions, namely the relevance of different documents is independent and users have the same utility function. However, the different frameworks and paradigms for relevance estimation can be significantly affected by the fact that query and document agents might have generated the query and the document, respectively. The authors discuss examples of relevance estimation paradigms that should be re-considered, including the relevance model, the risk minimization framework, and the axiomatic framework. The authors propose that ranker agents should address the query agent through query intent identification and agent (type) identification. Similarly, ranker agents should address the document agent by identifying the document agent type to estimate document relevance based on the agent that generated the specific document.

The paper discusses the query agent, which formulates queries to reflect human intent while accounting for potential biases, and can be integrated with other agents to mitigate biases of ranker agents. Unlike human users, query agents can generate large-scale query variations and evaluate them systematically, learning from past user interactions and generalizing over users to refine query generation. The paper discusses strategic query agents which could employ different strategies to align with the ranker agent or a specific document agent.

The paper analyzes the document agent, which is responsible for generating or modifying documents, with LLMs capable of creating original content. The paper discusses the differences between strategic and non-strategic document modifications, and discusses how to design mechanisms that actively de-incentivize strategic modifications without penalizing non-strategic updates. The paper also reviews the phenomenon of "mimicking the winner," and the need to adapt to ranker biases.

Finally, the paper addresses the need to re-consider evaluation in multi-agent retrieval settings. The authors argue that the interplay between ranker, document, and query agents on retrieval performance calls for a re-consideration of evaluation, and that test collections should be constructed in terms of agents generating documents and queries. The paper advocates for simulation-based evaluation in multi-agent retrieval settings, where designers of new retrieval methods (ranker agents) can evaluate them in an online manner.

The paper empirically explores the effects of interactions between different query, document, and ranker agents through a series of three experiments. The first experiment evaluates the effectiveness of different rankers with various query and document agents. The second experiment explores the interplay between the document and ranker agent, specifically when document agents compete against human-authored documents for rank promotion. The third experiment studies how ranker and query agents influence the competitive dynamics among different document agents. Datasets from ranking competitions were utilized, in which human participants competed against automated agents. The paper considers lexical, semantic, and LLM implementations of ranker, query, and document agents, as well as human agents.

The first experiment assesses the effectiveness of different ranker agents when queries and documents are generated by query and document agents, respectively. The experiment uses three corpora: human (only human-generated documents), LLM (solely LLM-generated documents), and mixed (combination of both). The results show that across all rankers and query agent types, retrieval performance is significantly lower on a mixed corpus compared to a corpus with documents generated by a single agent (human or LLM). The results also show that retrieval effectiveness varies not only between human and LLM-generated queries but also significantly among different query agents of the same type.

In the second experiment, the authors perform an offline evaluation with the and datasets to contrast the effectiveness of document agents with respect to humans in promoting documents in rankings. The experiment finds that the zero-shot lexical and semantic document agents consistently achieve higher scaled rank promotion values than humans across all ranker agents. The experiment also demonstrates that a mismatch between the document agent type and the ranker agent type leads to a substantial decrease in the ability of the document agent to improve the ranking of its document.

The third experiment is an online evaluation that simulates a ranking competition between different document agents, using the CSP framework. The simulation includes two semantic document agents (E5 and Contriever), and two LLM document agents (Gemma and Llama), as well as a static agent that does not modify its initial document. The results show that the static document agent demonstrates superior performance when the query agent type and ranker agent type are not aligned, indicating that when the query and ranker types are mismatched, the ability of publishers to perform ranking-incentivized document manipulations decreases.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (3)
  1. Haya Nachimovsky (5 papers)
  2. Moshe Tennenholtz (97 papers)
  3. Oren Kurland (17 papers)
X Twitter Logo Streamline Icon: https://streamlinehq.com