Prompting Datasets: Data Discovery with Conversational Agents
Abstract: Can LLMs assist in data discovery? Data discovery predominantly happens via search on a data portal or the web, followed by assessment of the dataset to ensure it is fit for the intended purpose. The ability of conversational generative AI (CGAI) to support recommendations with reasoning implies it can suggest datasets to users, explain why it has done so, and provide information akin to documentation regarding the dataset in order to support a use decision. We hold 3 workshops with data users and find that, despite limitations around web capabilities, CGAIs are able to suggest relevant datasets and provide many of the required sensemaking activities, as well as support dataset analysis and manipulation. However, CGAIs may also suggest fictional datasets, and perform inaccurate analysis. We identify emerging practices in data discovery and present a model of these to inform future research directions and data prompt design.
- Can we trust the evaluation on ChatGPT? arXiv preprint arXiv:2303.12767 (2023).
- Hussam Alkaissi and Samy I McFarlane. 2023. Artificial hallucinations in ChatGPT: implications in scientific writing. Cureus 15, 2 (2023).
- Explainable Artificial Intelligence (XAI): Concepts, taxonomies, opportunities and challenges toward responsible AI. Information fusion 58 (2020), 82–115.
- Using large language models (LLMs) to improve website search experience with StatsChat. https://datasciencecampus.ons.gov.uk/using-large-language-models-llms-to-improve-website-search-experience-with-statschat/
- On the dangers of stochastic parrots: Can language models be too big?. In Proceedings of the 2021 ACM conference on fairness, accountability, and transparency. 610–623.
- Skill discrepancies between research, education, and jobs reveal the critical need to supply soft skills for the data economy. Proceedings of the National Academy of Sciences 115, 50 (2018), 12630–12637.
- Samuel R Bowman. 2023. Eight things to know about large language models. arXiv preprint arXiv:2304.00612 (2023).
- Extending open data platforms with storytelling features. In Proceedings of the 18th Annual international conference on digital government research. 48–53.
- Dataset search: a survey. The VLDB Journal 29, 1 (2020), 251–272.
- A historical perspective of explainable Artificial Intelligence. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 11, 1 (2021), e1391.
- Michael Dowling and Brian Lucey. 2023. ChatGPT for (finance) research: The Bananarama conjecture. Finance Research Letters 53 (2023), 103662.
- Niklas Elmqvist. 2011. Embodied human-data interaction. In ACM CHI 2011 Workshop “Embodied Interaction: Theory and Practice in HCI, Vol. 1. 104–107.
- Mark Frank and Johanna Walker. 2016. User centred methods for measuring the value of open data. The Journal of Community Informatics 12, 2 (2016).
- Mahak Gambhir and Vishal Gupta. 2017. Recent automatic text summarization techniques: a survey. Artificial Intelligence Review 47 (2017), 1–66.
- Chatgpt outperforms crowd-workers for text-annotation tasks. arXiv preprint arXiv:2303.15056 (2023).
- Alon Halevy and Jane Dwivedi-Yu. 2023. Learnings from data integration for augmented language models. arXiv preprint arXiv:2304.04576 (2023).
- Human Data Interaction, Disadvantage and Skills in the Community: enabling cross-sector environments for postdigital inclusion. Springer Nature.
- Increasing the Value of XAI for Users: A Psychological Perspective. KI-Künstliche Intelligenz (2023), 1–11.
- Luis-Daniel Ibáñez and Elena Simperl. 2022. A comparison of dataset search behaviour of internal versus search engine referred sessions. In Proceedings of the 2022 Conference on Human Information Interaction and Retrieval. 158–168.
- Truth in a sea of data: adoption and use of data search tools among researchers and journalists. Information, Communication & Society (2022), 1–20.
- Genegpt: Augmenting large language models with domain tools for improved access to biomedical information. ArXiv (2023).
- Characterising dataset search—An analysis of search logs and data requests. Journal of Web Semantics 55 (2019), 37–55.
- A query log analysis of dataset search. In Web Engineering: 17th International Conference, ICWE 2017, Rome, Italy, June 5-8, 2017, Proceedings 17. Springer, 429–436.
- Talking datasets–understanding data sensemaking behaviours. International journal of human-computer studies 146 (2021), 102562.
- Laura Koesten and Elena Simperl. 2021. UX of data: making data available doesn’t make it usable. Interactions 28, 2 (2021), 97–99.
- Everything you always wanted to know about a dataset: Studies in data summarisation. International journal of human-computer studies 135 (2020), 102367.
- The Trials and Tribulations of Working with Structured Data: -a Study on Information Seeking Behaviour. In Proceedings of the 2017 CHI conference on human factors in computing systems. 1277–1289.
- Gary Marchionini. 2006. Exploratory search: from finding to understanding. Commun. ACM 49, 4 (2006), 41–46.
- Gary Marchionini and Ben Shneiderman. 1988. Finding facts vs. browsing knowledge in hypertext systems. Computer 21, 1 (1988), 70–80.
- Sally J McMillan and Jang-Sun Hwang. 2002. Measures of perceived interactivity: An exploration of the role of direction of communication, user control, and time in shaping perceptions of interactivity. Journal of advertising 31, 3 (2002), 29–42.
- Rethinking search: making domain experts out of dilettantes. In Acm sigir forum, Vol. 55. ACM New York, NY, USA, 1–27.
- Augmented language models: a survey. arXiv preprint arXiv:2302.07842 (2023).
- Principles of explanation in human-AI systems. arXiv preprint arXiv:2102.04972 (2021).
- Natasha Noy. 2020. Discovering millions of datasets on the web. https://ai.googleblog.com/2023/02/datasets-at-your-fingertips-in-google.html
- Natasha Noy and Omar Benjelloun. 2023. Datasets at your fingertips in Google Search. https://ai.googleblog.com/2023/02/datasets-at-your-fingertips-in-google.html
- Filip Radlinski and Nick Craswell. 2017. A theoretical framework for conversational search. In Proceedings of the 2017 conference on conference human information interaction and retrieval. 117–126.
- In-Context Impersonation Reveals Large Language Models’ Strengths and Biases. arXiv preprint arXiv:2305.14930 (2023).
- Malik Sallam. 2023. ChatGPT utility in healthcare education, research, and practice: systematic review on the promising perspectives and valid concerns. In Healthcare, Vol. 11. MDPI, 887.
- Can artificial intelligence help for scientific writing? Critical care 27, 1 (2023), 1–5.
- Michael Schade. 2023. How do I use ChatGPT Browse with Bing to search the web? https://help.openai.com/en/articles/8077698-how-do-i-use-chatgpt-browse-with-bing-to-search-the-web
- Investigating Conversational Search Behavior for Domain Exploration. In European Conference on Information Retrieval. Springer, 608–616.
- Edward Segel and Jeffrey Heer. 2010. Narrative visualization: Telling stories with data. IEEE transactions on visualization and computer graphics 16, 6 (2010), 1139–1148.
- Chirag Shah and Emily M Bender. 2022. Situating search. In Proceedings of the 2022 Conference on Human Information Interaction and Retrieval. 221–232.
- Conversational browsing. arXiv preprint arXiv:2012.03704 (2020).
- Giulia Vilone and Luca Longo. 2020. Explainable artificial intelligence: a systematic review. arXiv preprint arXiv:2006.00093 (2020).
- Richard Y Wang. 1998. A product perspective on total data quality management. Commun. ACM 41, 2 (1998), 58–65.
- Zero-shot information extraction via chatting with chatgpt. arXiv preprint arXiv:2302.10205 (2023).
- A prompt pattern catalog to enhance prompt engineering with chatgpt. arXiv preprint arXiv:2302.11382 (2023).
- Challenges in data-to-document generation. arXiv preprint arXiv:1707.08052 (2017).
- A prompt log analysis of text-to-image generation systems. In Proceedings of the ACM Web Conference 2023. 3892–3902.
- Why Johnny can’t prompt: how non-AI experts try (and fail) to design LLM prompts. In Proceedings of the 2023 CHI Conference on Human Factors in Computing Systems. 1–21.
- Towards conversational search and recommendation: System ask, user respond. In Proceedings of the 27th acm international conference on information and knowledge management. 177–186.
- Guido Zuccon and Bevan Koopman. 2023. Dr ChatGPT, tell me what I want to hear: How prompt knowledge impacts health answer correctness. arXiv preprint arXiv:2302.13793 (2023).
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.
Top Community Prompts
Collections
Sign up for free to add this paper to one or more collections.