Papers

Topics

Authors

Recent

View all

Gemini 2.5 Flash

41 tokens/sec

GPT-4o

59 tokens/sec

Gemini 2.5 Pro Pro

41 tokens/sec

o3 Pro

7 tokens/sec

GPT-4.1 Pro

50 tokens/sec

DeepSeek R1 via Azure Pro

28 tokens/sec

2000 character limit reached

262

Search Engines in an AI Era: The False Promise of Factual and Verifiable Source-Cited Responses (2410.22349v1)

Published 15 Oct 2024 in cs.IR, cs.AI, cs.CL, cs.CY, and cs.HC

Abstract: LLM-based applications are graduating from research prototypes to products serving millions of users, influencing how people write and consume information. A prominent example is the appearance of Answer Engines: LLM-based generative search engines supplanting traditional search engines. Answer engines not only retrieve relevant sources to a user query but synthesize answer summaries that cite the sources. To understand these systems' limitations, we first conducted a study with 21 participants, evaluating interactions with answer vs. traditional search engines and identifying 16 answer engine limitations. From these insights, we propose 16 answer engine design recommendations, linked to 8 metrics. An automated evaluation implementing our metrics on three popular engines (You.com, Perplexity.ai, BingChat) quantifies common limitations (e.g., frequent hallucination, inaccurate citation) and unique features (e.g., variation in answer confidence), with results mirroring user study insights. We release our Answer Engine Evaluation benchmark (AEE) to facilitate transparent evaluation of LLM-based applications.

PDF HTML Abstract

Analyzing the Sociotechnical Dynamics of Answer Engines in AI-Based Search

The paper "Search Engines in an AI Era: The False Promise of Factual and Verifiable Source-Cited Responses" presents a comprehensive paper on the limitations and societal implications of Answer Engines. As LLMs become increasingly integrated into daily information retrieval tasks, they are metamorphosing from research instruments into influential technologies. This transformation demands an acute understanding of their utility and impact beyond the surface level, especially within the sociotechnical framework that this paper examines.

Key Findings from the Usability Study

The authors conducted an audit-centric usability paper involving 21 participants, focusing on the comparison between answer engines and traditional search engines. Through this paper, 16 core limitations of answer engines were identified. These limitations can be grouped based on four main components of an answer engine: the generated answer text, citations, sources, and user interface. Notably, three crucial limitations include:

The Lack of Objective Detail and Balance: Participants noted that answers were often devoid of necessary depth and presented one-sided perspectives. This propensity limits the exploration of diverse views, particularly in answering opinionated or debate-based queries.
Confidence and Improper Source Attribution: The paper revealed that answer engines often exhibited unjustified confidence in their responses and frequently misattributed citations. This gap raises caution regarding trust and factuality in the information these engines present.
User Autonomy and Source Transparency: Participants expressed a lack of control over source selection and verification, resulting from a predominantly opaque system architecture. This inadequacy impacts user trust and autonomy in verifying information accuracy.

Quantitative Evaluation Metrics and Results

Building on insights from the paper, the authors propose eight evaluation metrics for a systematic assessment of answer engines. These metrics examine aspects such as citation accuracy, statement relevance, and source necessity. The application of this framework across popular answer engines–You.com, Perplexity.ai, and BingChat–revealed substantial room for improvement. The engines frequently generate one-sided and overconfident answers, with Perplexity notably underperforming due to heightened confidence levels regardless of the question's nature.

Broader Implications

From a practical perspective, the findings underscore the necessity for continuous evaluation and transparency as these systems further permeate sociotechnical systems like healthcare and education. Theoretical implications include pondering the evolution of autonomous search engines into more comprehensive decision-making tools. As these technologies refine, their influence on user critical thinking and information verification practices demands scrutiny.

Future Developments

Looking forward, this field may witness advancements through improved interaction models that involve human feedback and better contextual understanding. Establishing robust governance structures and policies around AI applications remains crucial to mitigate bias and maintain ethical standards in information dissemination.

Conclusion

In conclusion, this paper emphasizes the importance of developing answer engines that are not only powerful in generating useful information but also aligned with ethical and transparent practices that support user empowerment. By conducting a meticulous audit, the authors contribute substantially to the discourse on AI-driven information retrieval systems, setting a precedent for future AI technologies and their integration into societal frameworks.

PDF Markdown Bookmark Chat (Pro)

References (79)

Authors (5)

Pranav Narayanan Venkit (19 papers)
Philippe Laban (40 papers)
Yilun Zhou (28 papers)
Yixin Mao (4 papers)
Chien-Sheng Wu (77 papers)

Tweets

https://twitter.com/PhilippeLaban/status/1853409911587103056

https://twitter.com/_reachsumit/status/1851836720083538210

https://twitter.com/PranavVenkit/status/1852393739504742546

https://twitter.com/youdotcom/status/1853605812863987770

https://twitter.com/PranavVenkit/status/1936666939096490104

https://twitter.com/SFResearch/status/1932548957131190779