Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
102 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Explainability for Transparent Conversational Information-Seeking (2405.03303v1)

Published 6 May 2024 in cs.IR and cs.HC

Abstract: The increasing reliance on digital information necessitates advancements in conversational search systems, particularly in terms of information transparency. While prior research in conversational information-seeking has concentrated on improving retrieval techniques, the challenge remains in generating responses useful from a user perspective. This study explores different methods of explaining the responses, hypothesizing that transparency about the source of the information, system confidence, and limitations can enhance users' ability to objectively assess the response. By exploring transparency across explanation type, quality, and presentation mode, this research aims to bridge the gap between system-generated responses and responses verifiable by the user. We design a user study to answer questions concerning the impact of (1) the quality of explanations enhancing the response on its usefulness and (2) ways of presenting explanations to users. The analysis of the collected data reveals lower user ratings for noisy explanations, although these scores seem insensitive to the quality of the response. Inconclusive results on the explanations presentation format suggest that it may not be a critical factor in this setting.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (61)
  1. Guidelines for Human-AI Interaction. In Proceedings of the 2019 CHI Conference on Human Factors in Computing Systems (CHI ’19). 1–13.
  2. Leif Azzopardi. 2021. Cognitive Biases in Search: A Review and Reflection of Cognitive Biases in Information Retrieval. In Proceedings of the 2021 Conference on Human Information Interaction and Retrieval (CHIIR ’21). 27–37.
  3. Conceptualizing agent-human interactions during the conversational search process. In 2nd International ACM SIGIR Workshop Conference on Conversational Approaches to IR (CAIR ’18).
  4. Attributed Question Answering: Evaluation and Modeling for Attributed Large Language Models. arXiv:2212.08037 [cs.CL]
  5. Do People and Neural Nets Pay Attention to the Same Words: Studying Eye-tracking Data for Non-factoid QA Evaluation. In Proceedings of the 29th ACM International Conference on Information & Knowledge Management (CIKM ’20). 85–94.
  6. WikiHowQA: A Comprehensive Benchmark for Multi-Document Non-Factoid Question Answering. In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) (ACL ’23). 5291–5314.
  7. Quantifying Human-Perceived Answer Utility in Non-factoid Question Answering. In Proceedings of the 2021 Conference on Human Information Interaction and Retrieval (CHIIR ’21). 75–84.
  8. AttSum: Joint Learning of Focusing and Summarization with Neural Attention. In Proceedings of COLING 2016, the 26th International Conference on Computational Linguistics (COLING ’16). 547–556.
  9. Supporting High-Uncertainty Decisions through AI and Logic-Style Explanations. In Proceedings of the 28th International Conference on Intelligent User Interfaces (IUI ’23). 251–263.
  10. Explaining Decision-Making Algorithms through UI: Strategies to Help Non-Expert Stakeholders. In Proceedings of the 2019 CHI Conference on Human Factors in Computing Systems (CHI ’19). 1–12.
  11. Chun-Wei Chiang and Ming Yin. 2022. Exploring the Effects of Machine Learning Literacy Interventions on Laypeople’s Reliance on Machine Learning Models. In Proceeding of the 27th International Conference on Intelligent User Interfaces (IUI ’22). 148–161.
  12. QuAC: Question Answering in Context. In Findings of the Association for Computational Linguistics: EMNLP 2018 (EMNLP ’18). 2174–2184.
  13. All That’s ‘Human’ Is Not Gold: Evaluating Human Evaluation of Generated Text. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers) (ACL-IJNLP ’21). 7282–7296.
  14. Automatic Generation of Natural Language Explanations. In Proceedings of the 23rd International Conference on Intelligent User Interfaces Companion (IUI ’18). 1–2.
  15. Topic Difficulty: Collection and Query Formulation Effects. ACM Transactions on Information Systems 40, 1 (2022), 1–36.
  16. CAsT 2020: The Conversational Assistance Track Overview. In The Twenty-Ninth Text REtrieval Conference Proceedings (TREC ’20).
  17. Tim Draws. 2021. Understanding How Algorithmic and Cognitive Biases in Web Search Affect User Attitudes on Debated Topics. In Proceedings of the 44th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR ’21). 2709.
  18. Human Beyond the Machine: Challenges and Opportunities of Microtask Crowdsourcing. IEEE Intelligent Systems 30, 4 (2015), 81–85.
  19. Ruoyuan Gao and Chirag Shah. 2020. Toward creating a fairer ranking in search engine results. Information Processing & Management 57, 1 (2020), 102–138.
  20. How Stated Accuracy of an AI System and Analogies to Explain Accuracy Affect Human Reliance on the System. Proceedings of the ACM on Human-Computer Interaction 7, CSCW2 (2023), 1–29.
  21. Relation Module for Non-Answerable Predictions on Reading Comprehension. In Proceedings of the 23rd Conference on Computational Natural Language Learning (CoNLL ’19). 747–756.
  22. Survey of Hallucination in Natural Language Generation. Comput. Surveys 55, 12 (2023), 1–38.
  23. Less is Less: When are Snippets Insufficient for Human vs Machine Relevance Estimation?. In Advances in Information Retrieval: 44th European Conference on IR Research (ECIR ’22). 153–162.
  24. Diane Kelly. 2007. Methods for Evaluating Interactive Information Retrieval Systems with Users. Foundations and Trends in Information Retrieval 3, 1—2 (2007), 1–224.
  25. Youngwoo Kim and James Allan. 2019. Unsupervised Explainable Controversy Detection from Online News. In Advances in Information Retrieval: 41th European Conference on IR Research (ECIR ’19). 836–843.
  26. Effects of fact-checking warning labels and social endorsement cues on climate change fake news credibility and engagement on social media. Journal of Applied Social Psychology 53, 6 (2023), 495–507.
  27. Will You Accept an Imperfect AI? Exploring Designs for Adjusting End-user Expectations of AI Systems. In Proceedings of the 2019 CHI Conference on Human Factors in Computing Systems (CHI ’19). 1–14.
  28. Hurdles to Progress in Long-form Question Answering. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL-HLT ’21). 4940–4957.
  29. Weronika Łajewska and Krisztian Balog. 2023. Towards Filling the Gap in Conversational Search: From Passage Retrieval to Conversational Response Generation. In Proceedings of the 32nd ACM International Conference on Information and Knowledge Management (CIKM ’23). 5326–5330.
  30. SQuAD2-CR: Semi-supervised Annotation for Cause and Rationales for Unanswerability in SQuAD 2.0. In Proceedings of the 12th Language Resources and Evaluation Conference (LREC ’20). 5425–5432.
  31. Retrieval-augmented generation for knowledge-intensive NLP tasks. In Proceedings of the 34th International Conference on Neural Information Processing Systems (NIPS ’20). 9459–9474.
  32. PTAU: Prompt Tuning for Attributing Unanswerable Questions. In Proceedings of the 45th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR ’22). 1219–1229.
  33. Q. Vera Liao and Jennifer Wortman Vaughan. 2023. AI Transparency in the Age of LLMs: A Human-Centered Research Roadmap. arXiv:2306.01941 [cs.HC]
  34. Evaluating Verifiability in Generative Search Engines. In Findings of the Association for Computational Linguistics: EMNLP 2023 (EMNLP ’23). 7001–7025.
  35. Zhuoran Lu and Ming Yin. 2021. Human Reliance on Machine Learning Models When Performance Feedback is Limited: Heuristics and Risks. In Proceedings of the 2021 CHI Conference on Human Factors in Computing Systems (CHI ’21). 1–16.
  36. Don Monroe. 2018. AI, explain yourself. Commun. ACM 61, 11 (2018), 11–13.
  37. Ingrid Nunes and Dietmar Jannach. 2017. A systematic review and taxonomy of explanations in decision support and recommender systems. User Model. User-adapt. Interact. 27, 3-5 (2017), 393–444.
  38. TREC CAsT 2022: Going Beyond User Ask and System Retrieve with Initiative and Response Generation. In The Thirty-First Text REtrieval Conference Proceedings (TREC ’22).
  39. “I Think You Might Like This”: Exploring Effects of Confidence Signal Patterns on Trust in and Reliance on Conversational Recommender Systems. In Proceedings of the 2023 ACM Conference on Fairness, Accountability, and Transparency (FAccT ’23). 792–804.
  40. Filip Radlinski and Nick Craswell. 2017. A Theoretical Framework for Conversational Search. In Proceedings of the 2017 Conference on Conference Human Information Interaction and Retrieval (CHIIR ’17). 117–126.
  41. Know What You Don’t Know: Unanswerable Questions for SQuAD. In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (ACL ’18). 784–789.
  42. Amy Rechkemmer and Ming Yin. 2022. When Confidence Meets Accuracy: Exploring the Effects of Multiple Performance Indicators on Trust in Machine Learning Models. In Proceedings of the 2022 CHI Conference on Human Factors in Computing Systems (CHI ’22). 1–14.
  43. CoQA: A Conversational Question Answering Challenge. Transactions of the Association for Computational Linguistics 7 (2019), 249–266.
  44. Conversations with Search Engines: SERP-based Conversational Response Generation. ACM Transactions on Information Systems 39, 4 (2021).
  45. Ryoma Sakaeda and Daisuke Kawahara. 2022. Generate, Evaluate, and Select: A Dialogue System with a Response Evaluator for Diversity-Aware Response Generation. In Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL-HLT ’22).
  46. Tetsuya Sakai. 2018. Laboratory Experiments in Information Retrieval: Sample Sizes, Effect Sizes, and Statistical Power. The Information Retrieval Series, Vol. 40.
  47. Towards Self-Contained Answers: Entity-Based Answer Rewriting in Conversational Search. In Proceedings of the 2024 ACM SIGIR Conference on Human Information Interaction and Retrieval (CHIIR ’24). 209–218.
  48. Investigating confidence displays for top- N recommendations. Journal of the American Society for Information Science and Technology 64, 12 (2013), 2548–2563.
  49. Julius Steen and Katja Markert. 2021. How to Evaluate a Summarizer: Study Design and Statistical Analysis for Manual Linguistic Quality Evaluation. In Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: Main Volume (ACL ’21). 1861–1875.
  50. Yes, No or IDK: The Challenge of Unanswerable Yes/No Questions. In Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL-HLT ’22). 1075–1085.
  51. Understanding Factual Errors in Summarization: Errors, Summarizers, Datasets, Error Detectors. In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) (ACL ’23). 11626–11644.
  52. Investigating Crowdsourcing Protocols for Evaluating the Factual Consistency of Summaries. In Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL-HLT ’22).
  53. Exploring and Promoting Diagnostic Transparency and Explainability in Online Symptom Checkers. In Proceedings of the 2021 CHI Conference on Human Factors in Computing Systems (CHI ’21). 1–17.
  54. Evaluating XAI: A comparison of rule-based and example-based explanations. Artificial Intelligence 291 (2021), 103404.
  55. Yunjie (Calvin) Xu and Zhiwei Chen. 2006. Relevance judgment: What do information users consider beyond topicality? Journal of the American Society for Information Science and Technology 57, 7 (2006), 961–973.
  56. Conversational Information Seeking. Foundations and Trends in Information Retrieval 17, 3-4 (2023), 244–456.
  57. Yongfeng Zhang and Xu Chen. 2020. Explainable Recommendation: A Survey and New Perspectives. Found. Trends Inf. Retr. 14, 1 (2020), 1–101.
  58. Effect of confidence and explanation on accuracy and trust calibration in AI-assisted decision making. In Proceedings of the 2020 Conference on Fairness, Accountability, and Transparency (FAT ’20). 295–305.
  59. DIALOGPT : Large-Scale Generative Pre-training for Conversational Response Generation. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics: System Demonstrations (ACL ’20). 270–278.
  60. Retrospective Reader for Machine Reading Comprehension. In Proceedings of the AAAI Conference on Artificial Intelligence (AAAI ’21, Vol. 35). 14506–14514.
  61. Integrating Semantic and Structural Information with Graph Convolutional Network for Controversy Detection. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics (ACL ’20). 515–526.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (4)
  1. Weronika Łajewska (15 papers)
  2. Damiano Spina (29 papers)
  3. Johanne Trippas (5 papers)
  4. Krisztian Balog (76 papers)
Citations (5)

Summary

We haven't generated a summary for this paper yet.

X Twitter Logo Streamline Icon: https://streamlinehq.com

Tweets