Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
GPT-4o
Gemini 2.5 Pro Pro
o3 Pro
GPT-4.1 Pro
DeepSeek R1 via Azure Pro
2000 character limit reached

Beyond Generating Code: Evaluating GPT on a Data Visualization Course (2306.02914v3)

Published 5 Jun 2023 in cs.HC and cs.GR

Abstract: This paper presents an empirical evaluation of the performance of the Generative Pre-trained Transformer (GPT) model in Harvard's CS171 data visualization course. While previous studies have focused on GPT's ability to generate code for visualizations, this study goes beyond code generation to evaluate GPT's abilities in various visualization tasks, such as data interpretation, visualization design, visual data exploration, and insight communication. The evaluation utilized GPT-3.5 and GPT-4 to complete assignments of CS171, and included a quantitative assessment based on the established course rubrics, a qualitative analysis informed by the feedback of three experienced graders, and an exploratory study of GPT's capabilities in completing border visualization tasks. Findings show that GPT-4 scored 80% on quizzes and homework, and TFs could distinguish between GPT- and human-generated homework with 70% accuracy. The study also demonstrates GPT's potential in completing various visualization tasks, such as data cleanup, interaction with visualizations, and insight communication. The paper concludes by discussing the strengths and limitations of GPT in data visualization, potential avenues for incorporating GPT in broader visualization tasks, and the need to redesign visualization education.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (19)
  1. Language models are few-shot learners. Advances in neural information processing systems, 33:1877–1901, 2020.
  2. Sparks of artificial general intelligence: Early experiments with gpt-4. arXiv preprint arXiv:2303.12712, 2023.
  3. ChartGPT. Chartgpt. https://chartgpt.io/talk/. Accessed on 08.10.2023.
  4. Chatgpt goes to law school. Available at SSRN, 2023.
  5. All that’s’ human’is not gold: Evaluating human evaluation of generated text. arXiv preprint arXiv:2107.00061, 2021.
  6. Daigr.am. Daigr.am. https://gptstore.ai/plugins/daigr-am. Accessed on 08.10.2023.
  7. R. Dale. Gpt-3: What’s it good for? Natural Language Engineering, 27(1):113–118, 2021.
  8. V. Dibia. LIDA: A tool for automatic generation of grammar-agnostic visualizations and infographics using large language models. CoRR, abs/2303.02927, 2023. doi: 10 . 48550/arXiv . 2303 . 02927
  9. L. Floridi and M. Chiriatti. Gpt-3: Its nature, scope, limits, and consequences. Minds and Machines, 30:681–694, 2020.
  10. Human heuristics for ai-generated language are flawed. Proceedings of the National Academy of Sciences, 120(11):e2208839120, 2023.
  11. Gpt-4 passes the bar exam. Available at SSRN 4389233, 2023.
  12. P. Maddigan and T. Susnjak. Chat2vis: Generating data visualisations via natural language using chatgpt, codex and gpt-3 large language models. arXiv preprint arXiv:2302.02094, 2023.
  13. D. Noever and F. McKee. Numeracy from literacy: Data science as an emergent skill from large language models. arXiv preprint arXiv:2301.13382, 2023.
  14. Capabilities of gpt-4 on medical challenge problems. arXiv preprint arXiv:2303.13375, 2023.
  15. OpenAI. Chatgpt. https://chat.openai.com/. Accessed on 08.10.2023.
  16. OpenAI. Gpt-4. https://openai.com/research/gpt-4. Accessed on 08.10.2023.
  17. VizGPT. Vizgpt. https://www.vizgpt.ai/. Accessed on 08.10.2023.
  18. A survey on ml4vis: Applying machine learning advances to data visualization. IEEE transactions on visualization and computer graphics, 28(12):5134–5153, 2021.
  19. Chain of thought prompting elicits reasoning in large language models. arXiv preprint arXiv:2201.11903, 2022.
Citations (24)

Summary

  • The paper evaluates GPT-3.5 and GPT-4 by assessing their performance on data visualization tasks using Harvard’s CS171 course rubrics.
  • The paper reveals that while GPT-4 can interact with visualizations and generate code effectively, it struggles with multi-file reasoning and image-based inputs.
  • The paper suggests that integrating AI into visualization education could drive curriculum redesigns that focus on creative, human-centric problem solving.

Background and Introduction

Generative Pre-trained Transformers, or GPT models, have garnered significant attention for their ability to produce human-like text across a wide range of natural language processing tasks. Developed by OpenAI, variants such as GPT-3 and GPT-4 have been put to the test, demonstrating their prowess in generating code, passing professional exams, and even hinting at early forms of artificial general intelligence. In the domain of data visualization, they were mostly scrutinized for their code production abilities, but have not been extensively evaluated in other visualization-related tasks, such as design or data insight communication.

Exploring GPT's Role in Data Visualization

In a unique paper conducted on Harvard’s CS171 data visualization course, researchers assessed the performance of GPT-3.5 and GPT-4 beyond traditional code generation. They examined the model's ability in interpreting data, designing visualizations, exploring visual data, and communicating insights effectively. Utilizing established course rubrics and feedback from experienced graders, they conducted a quantitative and qualitative analysis on GPT's outputs. Both GPT-3.5 and GPT-4 completed the course's quizzes and homework assignments, scoring notable grades. Interestingly, graders could distinguish the AI’s work with relative accuracy, although they were misled at times, mistaking AI-generated work for human submissions in some instances.

Strengths and Potential for GPT in Data Visualization

The performance of the models unearthed GPT's potential for executing a variety of visualization tasks such as data cleanup, working with visualization libraries, and interactive elements. For instance, GPT-4 demonstrated an ability to interact with visualizations, dispatching JavaScript events, and offered insights into complex visualizations. Despite these strengths, researchers also observed limitations in GPT's reasoning when dealing with multiple code files, or when input data included images, as the current version tested didn't support image processing.

Challenges, Ethical Considerations, and Educational Implications

The paper did not only highlight the technical abilities but also underscored certain concerns. Researchers noted instances of "hallucination," where GPT would make assumptions about images it couldn't actually see, and "ethical constraints" where it refused tasks it deemed as cheating. As part of a broader discussion on the implications of utilizing GPT and similar models, the paper stresses examining their output critically, especially considering these models might reflect or amplify biases present in their training data.

An important subject triggered by the findings is the potential need to redesign educational approaches in visualization. The results suggest an opportunity to re-evaluate how visualization is taught, moving away from tasks AI models can easily perform towards those requiring unique human judgment and creativity. Moreover, instructors are encouraged to explore AI incorporation into the curriculum, creating a collaborative learning environment that both leverages and scrutinizes these AI tools.

The investigation presented in the paper invites numerous avenues for further research in GPT's role within the field of data visualization, pushing forward the discussion about AI's expanding reach in academia and professional settings.

Dice Question Streamline Icon: https://streamlinehq.com

Follow-up Questions

We haven't generated follow-up questions for this paper yet.

X Twitter Logo Streamline Icon: https://streamlinehq.com

Tweets