Conversational Challenges in AI-Powered Data Science: Obstacles, Needs, and Design Opportunities (2310.16164v1)
Abstract: LLMs are being increasingly employed in data science for tasks like data preprocessing and analytics. However, data scientists encounter substantial obstacles when conversing with LLM-powered chatbots and acting on their suggestions and answers. We conducted a mixed-methods study, including contextual observations, semi-structured interviews (n=14), and a survey (n=114), to identify these challenges. Our findings highlight key issues faced by data scientists, including contextual data retrieval, formulating prompts for complex tasks, adapting generated code to local environments, and refining prompts iteratively. Based on these insights, we propose actionable design recommendations, such as data brushing to support context selection, and inquisitive feedback loops to improve communications with AI-based assistants in data-science tools.
- Grounded Copilot: How Programmers Interact with Code-Generating Models. Proc. ACM Program. Lang. 7, OOPSLA1, Article 78 (apr 2023), 27 pages. https://doi.org/10.1145/3586030
- AutoPandas: Neural-Backed Generators for Program Synthesis. Proc. ACM Program. Lang. 3, OOPSLA, Article 168 (oct 2019), 27 pages. https://doi.org/10.1145/3360594
- Richard A. Becker and William S. Cleveland. 1987. Brushing Scatterplots. Technometrics 29, 2 (1987), 127–142. https://doi.org/10.1080/00401706.1987.10488204 arXiv:https://www.tandfonline.com/doi/pdf/10.1080/00401706.1987.10488204
- Mary Beth Kery and Brad A. Myers. 2017. Exploring exploratory programming. In 2017 IEEE Symposium on Visual Languages and Human-Centric Computing (VL/HCC). IEEE, New York, NY, USA, 25–29. https://doi.org/10.1109/VLHCC.2017.8103446
- What Did My AI Learn? How Data Scientists Make Sense of Model Behavior. ACM Trans. Comput.-Hum. Interact. 30, 1, Article 1 (mar 2023), 27 pages. https://doi.org/10.1145/3542921
- What’s Wrong with Computational Notebooks? Pain Points, Needs, and Design Opportunities. In Proceedings of the 2020 CHI Conference on Human Factors in Computing Systems (Honolulu, HI, USA) (CHI ’20). Association for Computing Machinery, New York, NY, USA, 1–12. https://doi.org/10.1145/3313831.3376729
- Evaluating Large Language Models Trained on Code. arXiv:2107.03374 [cs.LG]
- Wenhu Chen. 2023. Large Language Models are few(1)-shot Table Reasoners. arXiv:2210.06710 [cs.CL]
- CoWrangler: Recommender System for Data-Wrangling Scripts. In Companion of the 2023 International Conference on Management of Data (Seattle, WA, USA) (SIGMOD ’23). Association for Computing Machinery, New York, NY, USA, 147–150. https://doi.org/10.1145/3555041.3589722
- ColDeco: An End User Spreadsheet Inspection Tool for AI-Generated Code. In IEEE Symposium on Visual Languages and Human-Centric Computing. IEEE, IEEE, New York, NY, USA. https://www.microsoft.com/en-us/research/publication/coldeco-an-end-user-spreadsheet-inspection-tool-for-ai-generated-code/
- Interactions with Big Data Analytics. Interactions 19, 3 (may 2012), 50–59. https://doi.org/10.1145/2168931.2168943
- Martin Fowler. 2005. Bliki: Fluentinterface. https://www.martinfowler.com/bliki/FluentInterface.html
- Xi Ge and Emerson Murphy-Hill. 2014. Manual Refactoring Changes with Automated Refactoring Validation. In Proceedings of the 36th International Conference on Software Engineering (Hyderabad, India) (ICSE 2014). Association for Computing Machinery, New York, NY, USA, 1095–1105. https://doi.org/10.1145/2568225.2568280
- Paul Grice. 1991. Studies in the Way of Words. Harvard University Press, Cambridge, Mass. [u.a.].
- Proactive Wrangling: Mixed-Initiative End-User Programming of Data Transformation Scripts. In Proceedings of the 24th Annual ACM Symposium on User Interface Software and Technology (Santa Barbara, California, USA) (UIST ’11). Association for Computing Machinery, New York, NY, USA, 65–74. https://doi.org/10.1145/2047196.2047205
- An Inquisitive Code Editor for Addressing Novice Programmers’ Misconceptions of Program Behavior. In Proceedings of the 43rd International Conference on Software Engineering: Joint Track on Software Engineering Education and Training (Virtual Event, Spain) (ICSE-JSEET ’21). IEEE Press, New York, NY, USA, 165–170. https://doi.org/10.1109/ICSE-SEET52601.2021.00026
- LLMs for Semi-Automated Data Science: Introducing CAAFE for Context-Aware Automated Feature Engineering. arXiv:2305.03403 [cs.AI]
- Exploring the Learnability of Program Synthesizers by Novice Programmers. In Proceedings of the 35th Annual ACM Symposium on User Interface Software and Technology (Bend, OR, USA) (UIST ’22). Association for Computing Machinery, New York, NY, USA, Article 64, 15 pages. https://doi.org/10.1145/3526113.3545659
- Challenges and Applications of Large Language Models. arXiv:2307.10169 [cs.CL]
- Wrangler: Interactive Visual Specification of Data Transformation Scripts. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems (Vancouver, BC, Canada) (CHI ’11). Association for Computing Machinery, New York, NY, USA, 3363–3372. https://doi.org/10.1145/1978942.1979444
- Enterprise data analysis and visualization: An interview study. IEEE Transactions on Visualization and Computer Graphics 18, 12 (2012), 2917–2926.
- The Story in the Notebook: Exploratory Data Science Using a Literate Programming Tool. In Proceedings of the 2018 CHI Conference on Human Factors in Computing Systems (Montreal QC, Canada) (CHI ’18). Association for Computing Machinery, New York, NY, USA, 1–11. https://doi.org/10.1145/3173574.3173748
- Data Scientists in Software Teams: State of the Art and Challenges. IEEE Transactions on Software Engineering 44, 11 (2018), 1024–1038. https://doi.org/10.1109/TSE.2017.2754374
- Sean Kross and Philip J. Guo. 2019. Practitioners Teaching Data Science in Industry and Academia: Expectations, Workflows, and Challenges. In Proceedings of the 2019 CHI Conference on Human Factors in Computing Systems (Glasgow, Scotland Uk) (CHI ’19). Association for Computing Machinery, New York, NY, USA, 1–14. https://doi.org/10.1145/3290605.3300493
- J. Richard Landis and Gary G. Koch. 1977. The measurement of observer agreement for categorical data. Biometrics 33, 1 (1977), 159. https://doi.org/10.2307/2529310
- Understanding the Usability of AI Programming Assistants. arXiv:2303.17125 [cs.SE]
- “What It Wants Me To Say”: Bridging the Abstraction Gap Between End-User Programmers and Code-Generating Large Language Models. In Proceedings of the 2023 CHI Conference on Human Factors in Computing Systems (Hamburg, Germany) (CHI ’23). Association for Computing Machinery, New York, NY, USA, Article 598, 31 pages. https://doi.org/10.1145/3544548.3580817
- On the Design of AI-Powered Code Assistants for Notebooks. In Proceedings of the 2023 CHI Conference on Human Factors in Computing Systems (Hamburg, Germany) (CHI ’23). Association for Computing Machinery, New York, NY, USA, Article 434, 16 pages. https://doi.org/10.1145/3544548.3580940
- How Data Science Workers Work with Data: Discovery, Capture, Curation, Design, Creation. In Proceedings of the 2019 CHI Conference on Human Factors in Computing Systems (Glasgow, Scotland Uk) (CHI ’19). Association for Computing Machinery, New York, NY, USA, 1–15. https://doi.org/10.1145/3290605.3300356
- Jakob Nielsen. 2006. Progressive disclosure. nngroup.com (2006).
- David Noever and Forrest McKee. 2023. Numeracy from Literacy: Data Science as an Emergent Skill from Large Language Models. arXiv:2301.13382 [cs.CL]
- OpenAI. 2023. GPT-4 Technical Report. arXiv:2303.08774 [cs.CL]
- Exploration and Explanation in Computational Notebooks. In Proceedings of the 2018 CHI Conference on Human Factors in Computing Systems (Montreal QC, Canada) (CHI ’18). Association for Computing Machinery, New York, NY, USA, 1–12. https://doi.org/10.1145/3173574.3173606
- Johnny Saldaña. 2009. The Coding Manual for Qualitative Researchers. http://ci.nii.ac.jp/ncid/BB20067005
- What is it like to program with artificial intelligence?. In Proceedings of the 33rd Annual Conference of the Psychology of Programming Interest Group (PPIG 2022).
- Remote, but Connected: How #TidyTuesday Provides an Online Community of Practice for Data Scientists. Proc. ACM Hum.-Comput. Interact. 5, CSCW1, Article 52 (apr 2021), 31 pages. https://doi.org/10.1145/3449126
- The User Experience of ChatGPT: Findings from a Questionnaire Study of Early Users. In Proceedings of the 5th International Conference on Conversational User Interfaces (Eindhoven, Netherlands) (CUI ’23). Association for Computing Machinery, New York, NY, USA, Article 2, 10 pages. https://doi.org/10.1145/3571884.3597144
- GridBook: Natural Language Formulas for the Spreadsheet Grid. In 27th International Conference on Intelligent User Interfaces (Helsinki, Finland) (IUI ’22). Association for Computing Machinery, New York, NY, USA, 345–368. https://doi.org/10.1145/3490099.3511161
- Data Diff: Interpretable, Executable Summaries of Changes in Distributions for Data Wrangling. In Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining (London, United Kingdom) (KDD ’18). Association for Computing Machinery, New York, NY, USA, 2279–2288. https://doi.org/10.1145/3219819.3220057
- Towards More Effective AI-Assisted Programming: A Systematic Design Exploration to Improve Visual Studio IntelliCode’s User Experience. (2023), 185–195. https://doi.org/10.1109/ICSE-SEIP58684.2023.00022
- Expectation vs. Experience: Evaluating the Usability of Code Generation Tools Powered by Large Language Models. In Extended Abstracts of the 2022 CHI Conference on Human Factors in Computing Systems (New Orleans, LA, USA) (CHI EA ’22). Association for Computing Machinery, New York, NY, USA, Article 332, 7 pages. https://doi.org/10.1145/3491101.3519665
- How Social Q&A Sites Are Changing Knowledge Sharing in Open Source Software Communities. In Proceedings of the 17th ACM Conference on Computer Supported Cooperative Work & Social Computing (Baltimore, Maryland, USA) (CSCW ’14). Association for Computing Machinery, New York, NY, USA, 342–354. https://doi.org/10.1145/2531602.2531659
- Cong Yan and Yeye He. 2020. Auto-Suggest: Learning-to-Recommend Data Preparation Steps Using Data Science Notebooks. In Proceedings of the 2020 ACM SIGMOD International Conference on Management of Data (Portland, OR, USA) (SIGMOD ’20). Association for Computing Machinery, New York, NY, USA, 1539–1554. https://doi.org/10.1145/3318464.3389738
- Natural Language to Code Generation in Interactive Data Science Notebooks. arXiv:2212.09248 [cs.CL]
- Bhavya Chopra (5 papers)
- Ananya Singha (6 papers)
- Anna Fariha (12 papers)
- Sumit Gulwani (55 papers)
- Chris Parnin (19 papers)
- Ashish Tiwari (44 papers)
- Austin Z. Henley (12 papers)