Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
80 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Prompt4Vis: Prompting Large Language Models with Example Mining and Schema Filtering for Tabular Data Visualization (2402.07909v1)

Published 29 Jan 2024 in cs.HC, cs.AI, cs.CL, and cs.DB

Abstract: Data visualization (DV) systems are increasingly recognized for their profound capability to uncover insights from vast datasets, gaining attention across both industry and academia. Crafting data queries is an essential process within certain declarative visualization languages (DVLs, e.g., Vega-Lite, EChart.). The evolution of NLP technologies has streamlined the use of natural language interfaces to visualize tabular data, offering a more accessible and intuitive user experience. However, current methods for converting natural language questions into data visualization queries, such as Seq2Vis, ncNet, and RGVisNet, despite utilizing complex neural network architectures, still fall short of expectations and have great room for improvement. LLMs such as ChatGPT and GPT-4, have established new benchmarks in a variety of NLP tasks, fundamentally altering the landscape of the field. Inspired by these advancements, we introduce a novel framework, Prompt4Vis, leveraging LLMs and in-context learning to enhance the performance of generating data visualization from natural language. Prompt4Vis comprises two key components: (1) a multi-objective example mining module, designed to find out the truly effective examples that strengthen the LLM's in-context learning capabilities for text-to-vis; (2) a schema filtering module, which is proposed to simplify the schema of the database. Extensive experiments through 5-fold cross-validation on the NVBench dataset demonstrate the superiority of Prompt4Vis, which notably surpasses the state-of-the-art (SOTA) RGVisNet by approximately 35.9% and 71.3% on dev and test sets, respectively. To the best of our knowledge, Prompt4Vis is the first work that introduces in-context learning into the text-to-vis for generating data visualization queries.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (45)
  1. A Multitask, Multilingual, Multimodal Evaluation of ChatGPT on Reasoning, Hallucination, and Interactivity. CoRR abs/2302.04023 (2023).
  2. Visual Prompting via Image Inpainting. In Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, NeurIPS 2022, New Orleans, LA, USA, November 28 - December 9, 2022.
  3. Language Models are Few-Shot Learners. In Advances in Neural Information Processing Systems 33: Annual Conference on Neural Information Processing Systems 2020, NeurIPS 2020, December 6-12, 2020, virtual.
  4. PaLM: Scaling Language Modeling with Pathways. J. Mach. Learn. Res. 24 (2023), 240:1–240:113.
  5. Text-to-Viz: Automatic Generation of Infographics from Proportion-Related Natural Language Statements. IEEE Trans. Vis. Comput. Graph. 26, 1 (2020), 906–916.
  6. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, Minneapolis, MN, USA, June 2-7, 2019, Volume 1 (Long and Short Papers), Jill Burstein, Christy Doran, and Thamar Solorio (Eds.). Association for Computational Linguistics, 4171–4186.
  7. Victor Dibia. 2023. LIDA: A Tool for Automatic Generation of Grammar-Agnostic Visualizations and Infographics using Large Language Models. In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics: System Demonstrations, ACL 2023, Toronto, Canada, July 10-12, 2023, Danushka Bollegala, Ruihong Huang, and Alan Ritter (Eds.). Association for Computational Linguistics, 113–126.
  8. Victor Dibia and Çagatay Demiralp. 2019. Data2Vis: Automatic Generation of Data Visualizations Using Sequence-to-Sequence Recurrent Neural Networks. IEEE Computer Graphics and Applications 39, 5 (2019), 33–46.
  9. C3: Zero-shot Text-to-SQL with ChatGPT. CoRR abs/2307.07306 (2023).
  10. Text-to-SQL Empowered by Large Language Models: A Benchmark Evaluation. CoRR abs/2308.15363 (2023). https://doi.org/10.48550/ARXIV.2308.15363 arXiv:2308.15363
  11. SimCSE: Simple Contrastive Learning of Sentence Embeddings. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, EMNLP 2021, Virtual Event / Punta Cana, Dominican Republic, 7-11 November, 2021, Marie-Francine Moens, Xuanjing Huang, Lucia Specia, and Scott Wen-tau Yih (Eds.). Association for Computational Linguistics, 6894–6910.
  12. Pat Hanrahan. 2006. VizQL: a language for query, analysis and visualization. In Proceedings of the ACM SIGMOD International Conference on Management of Data, Chicago, Illinois, USA, June 27-29, 2006, Surajit Chaudhuri, Vagelis Hristidis, and Neoklis Polyzotis (Eds.). ACM, 721.
  13. Natural Language Dataset Generation Framework for Visualizations Powered by Large Language Models. CoRR abs/2309.10245 (2023). https://doi.org/10.48550/ARXIV.2309.10245 arXiv:2309.10245
  14. ECharts: A declarative framework for rapid construction of web-based visualization. Vis. Informatics 2, 2 (2018), 136–146.
  15. What Makes Good In-Context Examples for GPT-3?. In Proceedings of Deep Learning Inside Out: The 3rd Workshop on Knowledge Extraction and Integration for Deep Learning Architectures, DeeLIO@ACL 2022, Dublin, Ireland and Online, May 27, 2022, Eneko Agirre, Marianna Apidianaki, and Ivan Vulic (Eds.). Association for Computational Linguistics, 100–114.
  16. Fantastically Ordered Prompts and Where to Find Them: Overcoming Few-Shot Prompt Order Sensitivity. In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), ACL 2022, Dublin, Ireland, May 22-27, 2022, Smaranda Muresan, Preslav Nakov, and Aline Villavicencio (Eds.). Association for Computational Linguistics, 8086–8098.
  17. DeepEye: Towards Automatic Data Visualization. In 34th IEEE International Conference on Data Engineering, ICDE 2018, Paris, France, April 16-19, 2018. IEEE Computer Society, 101–112.
  18. DeepEye: Creating Good Data Visualizations by Keyword Search. In Proceedings of the 2018 International Conference on Management of Data, SIGMOD Conference 2018, Houston, TX, USA, June 10-15, 2018, Gautam Das, Christopher M. Jermaine, and Philip A. Bernstein (Eds.). ACM, 1733–1736.
  19. Synthesizing Natural Language to Visualization (NL2VIS) Benchmarks from NL2SQL Benchmarks. In SIGMOD ’21: International Conference on Management of Data, Virtual Event, China, June 20-25, 2021, Guoliang Li, Zhanhuai Li, Stratos Idreos, and Divesh Srivastava (Eds.). ACM, 1235–1247.
  20. Natural Language to Visualization by Neural Machine Translation. IEEE Trans. Vis. Comput. Graph. 28, 1 (2022), 217–226.
  21. Faithful Chain-of-Thought Reasoning. CoRR abs/2301.13379 (2023). https://doi.org/10.48550/ARXIV.2301.13379
  22. Paula Maddigan and Teo Susnjak. 2023. Chat2VIS: Generating Data Visualizations via Natural Language Using ChatGPT, Codex and GPT-3 Large Language Models. IEEE Access 11 (2023), 45181–45193.
  23. Formalizing Visualization Design Knowledge as Constraints: Actionable and Extensible Models in Draco. IEEE Trans. Vis. Comput. Graph. 25, 1 (2019), 438–448.
  24. Can Foundation Models Wrangle Your Data? Proc. VLDB Endow. 16, 4 (2022), 738–746.
  25. NL4DV: A Toolkit for Generating Analytic Specifications for Data Visualization from Natural Language Queries. IEEE Trans. Vis. Comput. Graph. 27, 2 (2021), 369–379.
  26. Tai Nguyen and Eric Wong. 2023. In-context Example Selection with Influences. CoRR abs/2302.11042 (2023). https://doi.org/10.48550/ARXIV.2302.11042 arXiv:2302.11042
  27. Show Your Work: Scratchpads for Intermediate Computation with Language Models. CoRR abs/2112.00114 (2021). arXiv:2112.00114 https://arxiv.org/abs/2112.00114
  28. OpenAI. 2022. Introducing chatgpt. https://openai.com/blog/chatgpt.
  29. OpenAI. 2023. Gpt-4 technical report. arXiv:2303.08774 [cs.DL]
  30. Mohammadreza Pourreza and Davood Rafiei. 2023. DIN-SQL: Decomposed In-Context Learning of Text-to-SQL with Self-Correction. CoRR abs/2304.11015 (2023). https://doi.org/10.48550/ARXIV.2304.11015 arXiv:2304.11015
  31. Learning to Recommend Visualizations from Data. In KDD ’21: The 27th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, Virtual Event, Singapore, August 14-18, 2021, Feida Zhu, Beng Chin Ooi, and Chunyan Miao (Eds.). ACM, 1359–1369.
  32. Is ChatGPT a General-Purpose Natural Language Processing Task Solver?. In Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, EMNLP 2023, Singapore, December 6-10, 2023, Houda Bouamor, Juan Pino, and Kalika Bali (Eds.). Association for Computational Linguistics, 1339–1384.
  33. Learning To Retrieve Prompts for In-Context Learning. In Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL 2022, Seattle, WA, United States, July 10-15, 2022, Marine Carpuat, Marie-Catherine de Marneffe, and Iván Vladimir Meza Ruíz (Eds.). Association for Computational Linguistics, 2655–2671.
  34. Vega-Lite: A Grammar of Interactive Graphics. IEEE Trans. Vis. Comput. Graph. 23, 1 (2017), 341–350.
  35. Significance of Patterns in Data Visualisations. In Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, KDD 2019, Anchorage, AK, USA, August 4-8, 2019, Ankur Teredesai, Vipin Kumar, Ying Li, Rómer Rosales, Evimaria Terzi, and George Karypis (Eds.). ACM, 1509–1517.
  36. Effortless Data Exploration with zenvisage: An Expressive and Interactive Visual Analytics System. Proc. VLDB Endow. 10, 4 (2016), 457–468.
  37. RGVisNet: A Hybrid Retrieval-Generation Neural Framework Towards Automatic Data Visualization Generation. In KDD ’22: The 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, Washington, DC, USA, August 14 - 18, 2022, Aidong Zhang and Huzefa Rangwala (Eds.). ACM, 1646–1655.
  38. Towards Visualization Recommendation Systems. SIGMOD Rec. 45, 4 (2016), 34–39.
  39. Attention is All you Need. In Advances in Neural Information Processing Systems 30: Annual Conference on Neural Information Processing Systems 2017, December 4-9, 2017, Long Beach, CA, USA. 5998–6008.
  40. RAT-SQL: Relation-Aware Schema Encoding and Linking for Text-to-SQL Parsers. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, ACL 2020, Online, July 5-10, 2020. Association for Computational Linguistics, 7567–7578.
  41. Data Formulator: AI-Powered Concept-Driven Visualization Authoring. IEEE Trans. Vis. Comput. Graph. 30, 1 (2024), 1128–1138.
  42. LLM4Vis: Explainable Visualization Recommendation using ChatGPT. In Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing: EMNLP 2023 - Industry Track, Singapore, December 6-10, 2023, Mingxuan Wang and Imed Zitouni (Eds.). Association for Computational Linguistics, 675–692.
  43. Chain-of-Thought Prompting Elicits Reasoning in Large Language Models. In Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, NeurIPS 2022, New Orleans, LA, USA, November 28 - December 9, 2022, Sanmi Koyejo, S. Mohamed, A. Agarwal, Danielle Belgrave, K. Cho, and A. Oh (Eds.).
  44. Hadley Wickham. 2009. ggplot2 - Elegant Graphics for Data Analysis. Springer. https://doi.org/10.1007/978-0-387-98141-3
  45. What Makes Good Examples for Visual In-Context Learning?
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (5)
  1. Shuaimin Li (6 papers)
  2. Xuanang Chen (14 papers)
  3. Yuanfeng Song (27 papers)
  4. Yunze Song (3 papers)
  5. Chen Zhang (403 papers)
Citations (3)
X Twitter Logo Streamline Icon: https://streamlinehq.com