Tensor Fields for Data Extraction from Chart Images: Bar Charts and Scatter Plots (2010.02319v1)

Published 5 Oct 2020 in cs.CV, cs.GR, cs.NA, and math.NA

Abstract: Charts are an essential part of both graphicacy (graphical literacy), and statistical literacy. As chart understanding has become increasingly relevant in data science, automating chart analysis by processing raster images of the charts has become a significant problem. Automated chart reading involves data extraction and contextual understanding of the data from chart images. In this paper, we perform the first step of determining the computational model of chart images for data extraction for selected chart types, namely, bar charts, and scatter plots. We demonstrate the use of positive semidefinite second-order tensor fields as an effective model. We identify an appropriate tensor field as the model and propose a methodology for the use of its degenerate point extraction for data extraction from chart images. Our results show that tensor voting is effective for data extraction from bar charts and scatter plots, and histograms, as a special case of bar charts.

PDF Abstract

Tensor Fields for Data Extraction from Chart Images: An Academic Review

The paper under review investigates a novel methodology for data extraction from raster images of charts, focusing specifically on bar charts and scatter plots. This work is situated within the broader context of graphicacy and the challenges posed by the ubiquity of rasterized chart images lacking source data. The research centers on developing a computational model that utilizes positive semidefinite second-order tensor fields to automate portions of Kimura's scheme of statistical literacy.

The methodology leverages tensor voting and structure tensor techniques to characterize and extract geometric features from raster images, a process that eschews the need for deep learning models which have limitations in dealing with the vast design space of chart images. Specifically, the researchers identify degenerate points in the tensor fields to localize critical geometric components such as the corners of bars in bar charts and the centroids of scatter points in scatter plots.

Key Contributions

Use of Second-Order Tensor Fields: The paper underscores the efficacy of using positive semidefinite second-order tensor fields as a computational model for feature extraction from chart images. This stands in contrast to methods that primarily rely on object detection or deep learning, which often fall short in handling the variability inherent in chart designs.
Degenerate Point Extraction: By focusing on the extraction of degenerate points within the tensor fields, the authors enhance data extraction processes. These degenerate points capture significant geometric traits that directly correlate with chart data values, thereby enabling the reconstruction of the charts in pixel space.
Chart Image Preprocessing: The paper incorporates robust preprocessing methodologies such as morphological operations to address issues of aliasing and distortion in chart images. This preprocessing is crucial in increasing the fidelity of data extraction from raster images.

Numerical Results and Implications

The authors present numerical results highlighting the method's accuracy in data retrieval tasks, evidenced by the low Earth Mover's Distance (EMD) between extracted and original datasets for both bar charts and scatter plots. They report that the proposed model reduces false positives and negatives significantly when compared to previously existing methods like Scatteract, especially in scatter plots.

Theoretical and Practical Implications

From a theoretical standpoint, this research contributes to the exploration of tensor field topology in geometric analysis, proposing a viable model for levels A1 and A2 of statistical literacy as per Kimura's scheme. Practically, the implications are wide-reaching in domains requiring data extraction from archived documents, educational materials, and any scenario where digital graph reproduction is necessary without access to original data.

Future Directions

The paper hints at future exploration towards extending these methods to more complex chart types, such as those involving non-linear geometric mappings like pie charts. Additionally, advancements in clustering techniques for degenerate points could further improve data retrieval accuracy, especially when dealing with dense or overlapping data points in scatter plots.

In conclusion, the paper offers a comprehensive and effective approach to automating chart data extraction, promising enhancements in educational technology, archives management, and accessibility solutions for the visually impaired. The framework set forth combines geometric interpretation with computational rigor, paving the way for future research on tensor field applications in data visualization and computational geometry.

PDF Markdown Bookmark Chat (Pro)

Authors (3)

Citations (11)

View on Semantic Scholar

Related Papers

Find Related Papers

YouTube

Show All Videos