Visualization Generation with Large Language Models: An Evaluation
Abstract: Analysts frequently need to create visualizations in the data analysis process to obtain and communicate insights. To reduce the burden of creating visualizations, previous research has developed various approaches for analysts to create visualizations from natural language queries. Recent studies have demonstrated the capabilities of LLMs in natural language understanding and code generation tasks. The capabilities imply the potential of using LLMs to generate visualization specifications from natural language queries. In this paper, we evaluate the capability of a LLM to generate visualization specifications on the task of natural language to visualization (NL2VIS). More specifically, we have opted for GPT-3.5 and Vega-Lite to represent LLMs and visualization specifications, respectively. The evaluation is conducted on the nvBench dataset. In the evaluation, we utilize both zero-shot and few-shot prompt strategies. The results demonstrate that GPT-3.5 surpasses previous NL2VIS approaches. Additionally, the performance of few-shot prompts is higher than that of zero-shot prompts. We discuss the limitations of GPT-3.5 on NL2VIS, such as misunderstanding the data attributes and grammar errors in generated specifications. We also summarized several directions, such as correcting the ground truth and reducing the ambiguities in natural language queries, to improve the NL2VIS benchmark.
- J. D. Mackinlay, “Automating the design of graphical presentations of relational information,” ACM Transactions on Graphics, vol. 5, no. 2, pp. 110–141, 1986.
- J. Mackinlay, P. Hanrahan, and C. Stolte, “Show me: Automatic presentation for visual analysis,” IEEE Transactions on Visualization and Computer Graphics, vol. 13, no. 6, pp. 1137–1144, 2007.
- Y. Sun, J. Leigh, A. E. Johnson, and S. Lee, “Articulate: A semi-automated model for translating natural language queries into meaningful visualizations,” in Proc. Int. Symp. Smart Graphics (SG), ser. Lecture Notes in Computer Science, vol. 6133, 2010, pp. 184–195.
- “The stanford parser,” http://nlp.stanford.edu/software/lex-parser.shtml.
- Y. Luo, X. Qin, N. Tang, G. Li, and X. Wang, “DeepEye: Creating good data visualizations by keyword search,” in Proc. Int. ACM Conf. Management of Data (SIGMOD), 2018, pp. 1733–1736.
- “Apache opennlp,” 2017, http://opennlp.apache.org.
- A. Narechania, A. Srinivasan, and J. T. Stasko, “NL4DV: A toolkit for generating analytic specifications for data visualization from natural language queries,” IEEE Transactions on Visualization and Computer Graphics, vol. 27, no. 2, pp. 369–379, 2021.
- C. D. Manning, M. Surdeanu, J. Bauer, J. R. Finkel, S. Bethard, and D. McClosky, “The Stanford CoreNLP natural language processing toolkit,” in Proc. Con. Association for Computational Linguistics (ACL), 2014, pp. 55–60.
- L. Shen, E. Shen, Y. Luo, X. Yang, X. Hu, X. Zhang, Z. Tai, and J. Wang, “Towards natural language interfaces for data visualization: A survey,” IEEE Transactions on Visualization and Computer Graphics, vol. 29, no. 6, pp. 3121–3144, 2023.
- C. Liu, Y. Han, R. Jiang, and X. Yuan, “ADVISor: Automatic visualization answer for natural-language question on tabular data,” in Proc. IEEE Pacific Visualization Symposium (PacificVis), 2021, pp. 11–20.
- Y. Luo, N. Tang, G. Li, J. Tang, C. Chai, and X. Qin, “Natural language to visualization by neural machine translation,” IEEE Transactions on Visualization and Computer Graphics, vol. 28, no. 1, pp. 217–226, 2022.
- J. Achiam, S. Adler, S. Agarwal, L. Ahmad, I. Akkaya, F. L. Aleman, D. Almeida, J. Altenschmidt, S. Altman, S. Anadkat et al., “GPT-4 technical report,” preprint arXiv:2303.08774, 2023.
- W. Yang, M. Liu, Z. Wang, and S. Liu, “Foundation models meet visualizations: Challenges and opportunities,” Computational Visual Media, 2024, preprint arXiv:2310.05771.
- D. Hendrycks, S. Basart, S. Kadavath, M. Mazeika, A. Arora, E. Guo, C. Burns, S. Puranik, H. He, D. Song, and J. Steinhardt, “Measuring coding challenge competence with APPS,” preprint arXiv:2105.09938, 2021.
- H. Liu, R. Ning, Z. Teng, J. Liu, Q. Zhou, and Y. Zhang, “Evaluating the logical reasoning ability of chatgpt and GPT-4,” preprint arXiv:2304.03439, 2023.
- Z. Yuan, H. Yuan, C. Tan, W. Wang, and S. Huang, “How well do large language models perform in arithmetic tasks?” preprint arXiv:2304.02015, 2023.
- J. Wei, X. Wang, D. Schuurmans, M. Bosma, F. Xia, E. Chi, Q. V. Le, D. Zhou et al., “Chain-of-thought prompting elicits reasoning in large language models,” Advances in Neural Information Processing Systems, vol. 35, pp. 24 824–24 837, 2022.
- W. Chen, X. Ma, X. Wang, and W. W. Cohen, “Program of thoughts prompting: Disentangling computation from reasoning for numerical reasoning tasks,” preprint arXiv:2211.12588, 2022.
- D. Zhou, N. Schärli, L. Hou, J. Wei, N. Scales, X. Wang, D. Schuurmans, C. Cui, O. Bousquet, Q. V. Le, and E. H. Chi, “Least-to-most prompting enables complex reasoning in large language models,” in Proc. Int. Con. Learning Representations (ICLR), 2023.
- P. Maddigan and T. Susnjak, “Chat2VIS: Generating data visualizations via natural language using chatgpt, codex and GPT-3 large language models,” IEEE Access, vol. 11, pp. 45 181–45 193, 2023.
- V. Dibia, “LIDA: A tool for automatic generation of grammar-agnostic visualizations and infographics using large language models,” in Proc. Con. Association for Computational Linguistics (ALC), 2023, pp. 113–126.
- P. Maddigan and T. Susnjak, “Chat2VIS: Fine-tuning data visualisations using multilingual natural language text and pre-trained large language models,” preprint arXiv:2303.14292, 2023.
- X. Pu, M. Kay, S. M. Drucker, J. Heer, D. Moritz, and A. Satyanarayan, “Special interest group on visualization grammars,” in Extended Abstacts of ACM Conf. Human Factors in Computing Systems, 2021.
- A. Satyanarayan, D. Moritz, K. Wongsuphasawat, and J. Heer, “Vega-lite: A grammar of interactive graphics,” IEEE Transactions on Visualization and Computer Graphics, vol. 23, no. 1, pp. 341–350, 2017.
- Y. Luo, N. Tang, G. Li, C. Chai, W. Li, and X. Qin, “Synthesizing natural language to visualization (NL2VIS) benchmarks from NL2SQL benchmarks,” in Proc. Int. ACM Conf. Management of Data (SIGMOD), 2021, pp. 1235–1247.
- K. C. Cox, R. E. Grinter, S. Hibino, L. J. Jagadeesan, and D. Mantilla, “A multi-modal natural language interface to an information visualization environment,” International Journal of Speech Technology, vol. 4, no. 3, pp. 297–314, 2001.
- T. Gao, M. Dontcheva, E. Adar, Z. Liu, and K. G. Karahalios, “DataTone: Managing ambiguity in natural language interfaces for data visualization,” in Proc. ACM Conf. Symposium on User Interface Software and Technology (UIST), 2015, pp. 489–500.
- V. Setlur, S. E. Battersby, M. Tory, R. Gossweiler, and A. X. Chang, “Eviza: A natural language interface for visual analysis,” in Proc. ACM Conf. Symposium on User Interface Software and Technology (UIST), 2016, pp. 365–377.
- T. J. Parr and R. W. Quong, “ANTLR: A predicated-LL(k) parser generator,” Software: Practice and Experience, vol. 25, no. 7, pp. 789–810, 1995.
- B. Yu and C. T. Silva, “FlowSense: A natural language interface for visual data exploration within a dataflow system,” IEEE Transactions on Visualization and Computer Graphics, vol. 26, no. 1, pp. 1–11, 2020.
- Y. Zhang, P. Pasupat, and P. Liang, “Macro grammars and holistic triggering for efficient semantic parsing,” in Proc. Conf. on Empirical Methods in Natural Language Processing (EMNLP), 2017, pp. 1214–1223.
- Y. Song, X. Zhao, R. C. Wong, and D. Jiang, “Rgvisnet: A hybrid retrieval-generation neural framework towards automatic data visualization generation,” in Proc. Int. ACM Conf. Management of Data (SIGMOD), 2022, pp. 1646–1655.
- Q. Chen, S. Pailoor, C. Barnaby, A. Criswell, C. Wang, G. Durrett, and I. Dillig, “Type-directed synthesis of visualizations from natural language queries,” Proc. ACM Program. Lang., vol. 6, no. OOPSLA2, pp. 532–559, 2022.
- J. Devlin, M. Chang, K. Lee, and K. Toutanova, “BERT: pre-training of deep bidirectional transformers for language understanding,” in Proc. Conf. North American Chapter of the Association for Computational Linguistics: Human Language Technologies(NAACL-HLT), 2019, pp. 4171–4186.
- A. Srinivasan, N. Nyapathy, B. Lee, S. M. Drucker, and J. T. Stasko, “Collecting and characterizing natural language utterances for specifying data visualizations,” in Proc. ACM Conf. Human Factors in Computing Systems (CHI), 2021, pp. 464:1–464:10.
- C. Wang, J. Thompson, and B. Lee, “Data Formulator: Ai-powered concept-driven visualization authoring,” preprint arXiv:2309.10094, 2023.
- L. Wang, S. Zhang, Y. Wang, E. Lim, and Y. Wang, “LLM4Vis: Explainable visualization recommendation using chatgpt,” preprint arXiv:2310.07652, 2023.
- H. Ko, H. Jeon, G. Park, D. H. Kim, N. W. Kim, J. Kim, and J. Seo, “Natural language dataset generation framework for visualizations powered by large language models,” preprint arXiv:2309.10245, 2023.
- F. Cassano, J. Gouwar, D. Nguyen, S. Nguyen, L. Phipps-Costin, D. Pinckney, M.-H. Yee, Y. Zi, C. J. Anderson, M. Q. Feldman et al., “MultiPL-E: A scalable and extensible approach to benchmarking neural code generation,” preprint arXiv:2208.08227, 2022.
- J. Liu, C. S. Xia, Y. Wang, and L. Zhang, “Is your code generated by chatgpt really correct? rigorous evaluation of large language models for code generation,” preprint arXiv:2305.01210, 2023.
- H. Ding, V. Kumar, Y. Tian, Z. Wang, R. Kwiatkowski, X. Li, M. K. Ramanathan, B. Ray, P. Bhatia, and S. Sengupta, “A static evaluation of code completion by large language models,” in Proc. Con. Association for Computational Linguistics (ACL), 2023, pp. 347–360.
- Y. Fu, L. Ou, M. Chen, Y. Wan, H. Peng, and T. Khot, “Chain-of-thought hub: A continuous effort to measure large language models’ reasoning performance,” preprint arXiv:abs/2305.17306, 2023.
- F. Xu, Q. Lin, J. Han, T. Zhao, J. Liu, and E. Cambria, “Are large language models really good logical reasoners? a comprehensive evaluation from deductive, inductive and abductive views,” preprint arXiv:2306.09841, 2023.
- T. Wei, J. Luan, W. Liu, S. Dong, and B. Wang, “CMATH: can your language model pass chinese elementary school math test?” 2023.
- S. Frieder, L. Pinchetti, A. Chevalier, R.-R. Griffiths, T. Salvatori, T. Lukasiewicz, P. C. Petersen, and J. Berner, “Mathematical Capabilities of ChatGPT,” preprint arXiv:2301.13867, 2023.
- Y. Wu, F. Jia, S. Zhang, H. Li, E. Zhu, Y. Wang, Y. T. Lee, R. Peng, Q. Wu, and C. Wang, “An empirical study on challenging math problem solving with GPT-4,” preprint arXiv:2306.01337, 2023.
- K. M. Collins, A. Q. Jiang, S. Frieder, L. Wong, M. Zilka, U. Bhatt, T. Lukasiewicz, Y. Wu, J. B. Tenenbaum, W. Hart, T. Gowers, W. Li, A. Weller, and M. Jamnik, “Evaluating language models for mathematics through interactions,” preprint arXiv:2306.01694, 2023.
- X. Dao and N. Le, “Investigating the effectiveness of chatgpt in mathematical reasoning and problem solving: Evidence from the vietnamese national high school graduation examination,” preprint arXiv:2306.06331, 2023.
- T. Kojima, S. S. Gu, M. Reid, Y. Matsuo, and Y. Iwasawa, “Large language models are zero-shot reasoners,” Advances in neural information processing systems, vol. 35, pp. 22 199–22 213, 2022.
- Y. Fu, H. Peng, A. Sabharwal, P. Clark, and T. Khot, “Complexity-based prompting for multi-step reasoning,” in Proc. Int. Con. Learning Representations (ICLR), 2023.
- P. Lu, L. Qiu, K. Chang, Y. N. Wu, S. Zhu, T. Rajpurohit, P. Clark, and A. Kalyan, “Dynamic prompt learning via policy gradient for semi-structured mathematical reasoning,” in Proc. Int. Con. Learning Representations (ICLR), 2023.
- X. Wang, J. Wei, D. Schuurmans, Q. V. Le, E. H. Chi, S. Narang, A. Chowdhery, and D. Zhou, “Self-consistency improves chain of thought reasoning in language models,” in Proc. Int. Con. Learning Representations (ICLR), 2023.
- Y. Weng, M. Zhu, F. Xia, B. Li, S. He, K. Liu, and J. Zhao, “Large Language Models are Better Reasoners with Self-Verification,” preprint arXiv:2212.09561, 2023.
- Y. Li, Z. Lin, S. Zhang, Q. Fu, B. Chen, J.-G. Lou, and W. Chen, “Making language models better reasoners with step-aware verifier,” in Proc. Con. Association for Computational Linguistics (ACL), 2023, pp. 5315–5333.
- A. Creswell, M. Shanahan, and I. Higgins, “Selection-Inference: Exploiting Large Language Models for Interpretable Logical Reasoning,” arXiv:2205.09712, 2022.
- L. Wang, W. Xu, Y. Lan, Z. Hu, Y. Lan, R. K. Lee, and E. Lim, “Plan-and-solve prompting: Improving zero-shot chain-of-thought reasoning by large language models,” in Proc. Con. Association for Computational Linguistics (ACL), 2023, pp. 2609–2634.
- P. Yin, W. Li, K. Xiao, A. Rao, Y. Wen, K. Shi, J. Howland, P. Bailey, M. Catasta, H. Michalewski, O. Polozov, and C. Sutton, “Natural language to code generation in interactive data science notebooks,” in Proc. Con. Association for Computational Linguistics (ACL), 2023, pp. 126–173.
- A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, L. Kaiser, and I. Polosukhin, “Attention is all you need,” in Proc. Conf. Neural Information Processing Systems (NeurIPS), 2017, pp. 5998–6008.
- “Vega-lite document,” https://vega.github.io/vega-lite/docs/.
- J. Mackinlay, “Automating the design of graphical presentations of relational information,” ACM Transactions on Graphics, vol. 5, no. 2, p. 110–141, 1986.
- Y. Kim and J. Heer, “Assessing effects of task and data distribution on the effectiveness of visual encodings,” Computer Graphics Forum, vol. 37, no. 3, pp. 157–167, 2018.
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.
Top Community Prompts
Collections
Sign up for free to add this paper to one or more collections.