Data2Vis: Automatic Generation of Data Visualizations Using Sequence to Sequence Recurrent Neural Networks (1804.03126v3)

Published 9 Apr 2018 in cs.HC, cs.AI, and cs.LG

Abstract: Rapidly creating effective visualizations using expressive grammars is challenging for users who have limited time and limited skills in statistics and data visualization. Even high-level, dedicated visualization tools often require users to manually select among data attributes, decide which transformations to apply, and specify mappings between visual encoding variables and raw or transformed attributes. In this paper we introduce Data2Vis, a neural translation model for automatically generating visualizations from given datasets. We formulate visualization generation as a sequence to sequence translation problem where data specifications are mapped to visualization specifications in a declarative language (Vega-Lite). To this end, we train a multilayered attention-based recurrent neural network (RNN) with long short-term memory (LSTM) units on a corpus of visualization specifications. Qualitative results show that our model learns the vocabulary and syntax for a valid visualization specification, appropriate transformations (count, bins, mean) and how to use common data selection patterns that occur within data visualizations. Data2Vis generates visualizations that are comparable to manually-created visualizations in a fraction of the time, with potential to learn more complex visualization strategies at scale.

PDF Abstract

Data2Vis: Automatic Generation of Data Visualizations Using Sequence to Sequence Recurrent Neural Networks

The paper "Data2Vis: Automatic Generation of Data Visualizations Using Sequence to Sequence Recurrent Neural Networks" presents a novel approach to streamline the creation of data visualizations by leveraging the power of deep neural networks to automate the visualization specification process. This research seeks to bridge the gap between data visualization and deep learning, proposing a methodology that conceptualizes visualization design as a translation problem.

Overview and Methodology

The authors introduce Data2Vis, an end-to-end trainable neural network model designed to map data specifications to visualization specifications. By employing a sequence-to-sequence encoder-decoder architecture with long short-term memory (LSTM) units and an attention mechanism, Data2Vis effectively models the visualization generation task akin to language translation. The model is trained on a corpus of Vega-Lite visualization specifications, demonstrating its ability to learn the vocabulary, syntax, and appropriate transformations required to produce valid and effective visualizations.

Through extensive experimentation, the model is shown to generate visualizations comparable to those manually created, thus reducing both the time and expertise required for visualization tasks. It capitalizes on the declarative nature of Vega-Lite, facilitating clarity and conciseness in visualization grammar while maintaining expressivity.

Key Contributions and Numerical Results

Conceptual Formulation: The paper's primary contribution is the conceptualization of visualization specification as a sequence-to-sequence translation problem, enabling the practical application of neural machine translation techniques to the domain of data visualization.
Model Implementation: The Data2Vis model's ability to generate visualization specifications autonomously from given datasets illustrates its potential as a practical tool for novice users to ease the visualization authoring process and as an aid for experts to initiate visualization design.
Qualitative Analysis: The paper provides several examples where Data2Vis successfully creates univariate and multivariate visualizations, leveraging concepts such as categorical responses and demographic attributes. The model achieves a log perplexity score of 0.032, underscoring its proficiency in generating plausible visualization specifications.

Implications for AI and Visualization Research

The introduction of Data2Vis has significant implications for both theoretical and practical aspects of AI and data visualization research. It represents a shift towards machine learning-driven approaches for visualization design, potentially leading to enhanced scalability and adaptability of visualization systems. By learning from examples rather than relying solely on preset rules and heuristics, Data2Vis offers a foundation for future systems capable of synthesizing more complex visualization strategies and responding dynamically to diverse data contexts.

Speculation on Future Developments

The research opens pathways to broader applications, including the integration of natural language processing for visualization specification through textual descriptions and the development of generative models for diversified visualization outputs. Future work may focus on expanding the model's capacity to handle larger datasets, complex transformations, and interactions, ultimately contributing to more nuanced and comprehensive data exploration tools.

In conclusion, Data2Vis exemplifies a promising intersection of AI and data visualization, establishing a robust baseline for subsequent research and development efforts aimed at automating and enhancing visual data exploration. While current work highlights some limitations, ongoing advancements in deep learning and data synthesis approaches will likely address these challenges, solidifying the role of automated visualization systems in data-driven decision-making.

PDF Markdown Bookmark Chat (Pro)

Authors (2)

Victor Dibia (15 papers)
Çağatay Demiralp (38 papers)

Citations (172)

View on Semantic Scholar

Data2Vis: Automatic Generation of Data Visualizations Using Sequence to Sequence Recurrent Neural Networks (1804.03126v3)