2000 character limit reached
GenPlot: Increasing the Scale and Diversity of Chart Derendering Data (2306.11699v1)
Published 20 Jun 2023 in cs.CV
Abstract: Vertical bars, horizontal bars, dot, scatter, and line plots provide a diverse set of visualizations to represent data. To understand these plots, one must be able to recognize textual components, locate data points in a plot, and process diverse visual contexts to extract information. In recent works such as Pix2Struct, Matcha, and Deplot, OCR-free chart-to-text translation has achieved state-of-the-art results on visual language tasks. These results outline the importance of chart-derendering as a pre-training objective, yet existing datasets provide a fixed set of training examples. In this paper, we propose GenPlot; a plot generator that can generate billions of additional plots for chart-derendering using synthetic data.
- Vqa: Visual question answering.
- Reading and reasoning over chart images for evidence-based automated fact-checking. In Findings of the Association for Computational Linguistics: EACL 2023, pages 399–414, Dubrovnik, Croatia. Association for Computational Linguistics.
- Benetech - making graphs accessible.
- Pali: A jointly-scaled multilingual language-image model.
- Array programming with NumPy. Nature, 585(7825):357–362.
- J. D. Hunter. 2007. Matplotlib: A 2d graphics environment. Computing in Science & Engineering, 9(3):90–95.
- Dvqa: Understanding data visualizations via question answering.
- Figureqa: An annotated figure dataset for visual reasoning.
- Prestu: Pre-training for scene-text understanding.
- Ocr-free document understanding transformer.
- Pix2struct: Screenshot parsing as pretraining for visual language understanding.
- Deplot: One-shot visual language reasoning by plot-to-table translation.
- Matcha: Enhancing visual language pretraining with math reasoning and chart derendering.
- Chartqa: A benchmark for question answering about charts with visual and logical reasoning.
- Plotqa: Reasoning over scientific plots.
- GloVe: Global vectors for word representation. In Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 1532–1543, Doha, Qatar. Association for Computational Linguistics.
- LayoutLM: Pre-training of text and layout for document image understanding. In Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. ACM.
- Brendan Artley (2 papers)