Visualization Generation with Large Language Models: An Evaluation (2401.11255v1)

Published 20 Jan 2024 in cs.HC

Abstract: Analysts frequently need to create visualizations in the data analysis process to obtain and communicate insights. To reduce the burden of creating visualizations, previous research has developed various approaches for analysts to create visualizations from natural language queries. Recent studies have demonstrated the capabilities of LLMs in natural language understanding and code generation tasks. The capabilities imply the potential of using LLMs to generate visualization specifications from natural language queries. In this paper, we evaluate the capability of a LLM to generate visualization specifications on the task of natural language to visualization (NL2VIS). More specifically, we have opted for GPT-3.5 and Vega-Lite to represent LLMs and visualization specifications, respectively. The evaluation is conducted on the nvBench dataset. In the evaluation, we utilize both zero-shot and few-shot prompt strategies. The results demonstrate that GPT-3.5 surpasses previous NL2VIS approaches. Additionally, the performance of few-shot prompts is higher than that of zero-shot prompts. We discuss the limitations of GPT-3.5 on NL2VIS, such as misunderstanding the data attributes and grammar errors in generated specifications. We also summarized several directions, such as correcting the ground truth and reducing the ambiguities in natural language queries, to improve the NL2VIS benchmark.

PDF HTML Abstract

The paper "Visualization Generation with LLMs: An Evaluation" explores the potential of using LLMs, specifically GPT-3.5, to generate visualization specifications from natural language queries. This topic is important because data visualization is a key part of data analysis, and automating this process can significantly streamline analytical workflows for researchers and professionals who may not be experts in visualization design but need to communicate insights effectively.

Background and Relevance

Data visualization helps in uncovering patterns and communicating insights from data analysis. Creating effective visualizations is traditionally a skill-intensive task, requiring knowledge of visualization design principles. Automating this process using natural language queries can save time and effort, allowing analysts to focus on insights rather than the mechanics of visualization. This paper evaluates the capability of LLMs to automate this process, using natural language processing to produce visualization specifications.

Explanation of Key Concepts

Natural Language to Visualization (NL2VIS): This task involves converting plain language descriptions into graphical data representations. The evaluation focuses on seeing how well GPT-3.5, an advanced LLM, can handle this conversion using the Vega-Lite grammar, which is a popular visualization tool.
Prompt Strategies: The paper examines different strategies for prompting the LLM. Two key strategies are compared:
- Zero-shot prompts: The LLM is given no previous examples or guidance, relying purely on its pre-existing LLM capabilities.
- Few-shot prompts: Providing the model with some example queries and corresponding visualizations to guide its responses.
nvBench Dataset: This benchmark dataset is used to evaluate the LLM's performance. It contains a large collection of natural language queries mapped to visualization tasks.

Evaluation Process

The evaluation uses comparison metrics to assess the accuracy of visualizations generated by GPT-3.5. These visualizations are compared to predefined correct results based on their visual content and underlying data structures.

Matching Accuracy: This assesses whether the generated visualizations match the expected output. Two methods were used:
- Pixel-based method: Compares visuals on a pixel-by-pixel basis, a very strict measure.
- SVG-JSON-based method: Compares the logical data representation and types of charts to avoid trivial mismatches due to minor graphical inconsistencies.

Findings and Recommendations

Performance of LLM: The few-shot prompting strategy significantly improved performance over the zero-shot approach, indicating that example-based learning enables the LLM to handle complex queries better.
Common Errors: Despite promising results, the LLM sometimes misinterprets data attributes or makes grammatical errors in Vega-Lite specifications. Clearer guidance on these areas could further improve performance.
Improving Benchmarks: Some inconsistencies were found in the nvBench dataset itself, such as queries with ambiguous chart types or unstated time units. To enhance benchmarks for future evaluations, clearer task descriptions and correct mapping instructions should be ensured.
Potential for Linting Tools: Developing tools that check and correct Vega-Lite syntax could further refine LLM outputs, offering a practical pathway to reduce errors in specification generation.

Overall, the evaluation highlights both the potential and the current limitations of using LLMs for visualization automation. The findings point to opportunities for improving both the LLMs through better training data and enhanced benchmarks for more accurate evaluation.

PDF Markdown Bookmark Chat (Pro)

References (61)

Authors (8)

Guozheng Li (19 papers)
Xinyu Wang (186 papers)
Gerile Aodeng (1 paper)
Shunyuan Zheng (6 papers)
Yu Zhang (1399 papers)
Chuangxin Ou (2 papers)
Song Wang (313 papers)
Chi Harold Liu (43 papers)

Citations (18)

View on Semantic Scholar

Visualization Generation with Large Language Models: An Evaluation (2401.11255v1)

Background and Relevance

Explanation of Key Concepts

Evaluation Process

Findings and Recommendations

Related Papers

Tweets