Beyond Generating Code: Evaluating GPT on a Data Visualization Course (2306.02914v3)
Abstract: This paper presents an empirical evaluation of the performance of the Generative Pre-trained Transformer (GPT) model in Harvard's CS171 data visualization course. While previous studies have focused on GPT's ability to generate code for visualizations, this study goes beyond code generation to evaluate GPT's abilities in various visualization tasks, such as data interpretation, visualization design, visual data exploration, and insight communication. The evaluation utilized GPT-3.5 and GPT-4 to complete assignments of CS171, and included a quantitative assessment based on the established course rubrics, a qualitative analysis informed by the feedback of three experienced graders, and an exploratory study of GPT's capabilities in completing border visualization tasks. Findings show that GPT-4 scored 80% on quizzes and homework, and TFs could distinguish between GPT- and human-generated homework with 70% accuracy. The study also demonstrates GPT's potential in completing various visualization tasks, such as data cleanup, interaction with visualizations, and insight communication. The paper concludes by discussing the strengths and limitations of GPT in data visualization, potential avenues for incorporating GPT in broader visualization tasks, and the need to redesign visualization education.
- Language models are few-shot learners. Advances in neural information processing systems, 33:1877–1901, 2020.
- Sparks of artificial general intelligence: Early experiments with gpt-4. arXiv preprint arXiv:2303.12712, 2023.
- ChartGPT. Chartgpt. https://chartgpt.io/talk/. Accessed on 08.10.2023.
- Chatgpt goes to law school. Available at SSRN, 2023.
- All that’s’ human’is not gold: Evaluating human evaluation of generated text. arXiv preprint arXiv:2107.00061, 2021.
- Daigr.am. Daigr.am. https://gptstore.ai/plugins/daigr-am. Accessed on 08.10.2023.
- R. Dale. Gpt-3: What’s it good for? Natural Language Engineering, 27(1):113–118, 2021.
- V. Dibia. LIDA: A tool for automatic generation of grammar-agnostic visualizations and infographics using large language models. CoRR, abs/2303.02927, 2023. doi: 10 . 48550/arXiv . 2303 . 02927
- L. Floridi and M. Chiriatti. Gpt-3: Its nature, scope, limits, and consequences. Minds and Machines, 30:681–694, 2020.
- Human heuristics for ai-generated language are flawed. Proceedings of the National Academy of Sciences, 120(11):e2208839120, 2023.
- Gpt-4 passes the bar exam. Available at SSRN 4389233, 2023.
- P. Maddigan and T. Susnjak. Chat2vis: Generating data visualisations via natural language using chatgpt, codex and gpt-3 large language models. arXiv preprint arXiv:2302.02094, 2023.
- D. Noever and F. McKee. Numeracy from literacy: Data science as an emergent skill from large language models. arXiv preprint arXiv:2301.13382, 2023.
- Capabilities of gpt-4 on medical challenge problems. arXiv preprint arXiv:2303.13375, 2023.
- OpenAI. Chatgpt. https://chat.openai.com/. Accessed on 08.10.2023.
- OpenAI. Gpt-4. https://openai.com/research/gpt-4. Accessed on 08.10.2023.
- VizGPT. Vizgpt. https://www.vizgpt.ai/. Accessed on 08.10.2023.
- A survey on ml4vis: Applying machine learning advances to data visualization. IEEE transactions on visualization and computer graphics, 28(12):5134–5153, 2021.
- Chain of thought prompting elicits reasoning in large language models. arXiv preprint arXiv:2201.11903, 2022.