Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
80 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

FigureQA: An Annotated Figure Dataset for Visual Reasoning (1710.07300v2)

Published 19 Oct 2017 in cs.CV

Abstract: We introduce FigureQA, a visual reasoning corpus of over one million question-answer pairs grounded in over 100,000 images. The images are synthetic, scientific-style figures from five classes: line plots, dot-line plots, vertical and horizontal bar graphs, and pie charts. We formulate our reasoning task by generating questions from 15 templates; questions concern various relationships between plot elements and examine characteristics like the maximum, the minimum, area-under-the-curve, smoothness, and intersection. To resolve, such questions often require reference to multiple plot elements and synthesis of information distributed spatially throughout a figure. To facilitate the training of machine learning systems, the corpus also includes side data that can be used to formulate auxiliary objectives. In particular, we provide the numerical data used to generate each figure as well as bounding-box annotations for all plot elements. We study the proposed visual reasoning task by training several models, including the recently proposed Relation Network as a strong baseline. Preliminary results indicate that the task poses a significant machine learning challenge. We envision FigureQA as a first step towards developing models that can intuitively recognize patterns from visual representations of data.

FigureQA: An Annotated Figure Dataset for Visual Reasoning

The paper "FigureQA: An Annotated Figure Dataset for Visual Reasoning" presents a meticulously constructed dataset designed to advance research in machine comprehension of visual data, specifically focusing on scientific-style figures. This dataset, termed FigureQA, comprises over one million question-answer pairs derived from more than 100,000 synthetic images. These figures are classified into five widely-used types: line plots, dot-line plots, vertical and horizontal bar graphs, and pie charts. The central objective is to develop systems capable of sophisticated reasoning tasks akin to those performed by humans in understanding visual data representations.

Dataset and Methodology

FigureQA distinguishes itself by ensuring a structured and interpretable dataset. Each figure in the dataset is associated with numerous question-answers generated from 15 templates, crafted to probe diverse relationships and characteristics within the plots. These characteristics include determining maxima, minima, medians, area-under-the-curve, smoothness, and intersections among plot elements. The questions necessitate a broad understanding and inference over multiple plot components, challenging existing machine learning models.

Complementing the visual data, the dataset also provides supplementary numerical data and bounding-box annotations for the plotted elements. This auxiliary information is critical in allowing researchers to create auxiliary objectives, such as using bounding boxes to formulate attention mechanisms or reconstructing numerical data from visual inputs. This comprehensive collection of data enables a deeper exploration of machine learning models' competencies in visual reasoning.

Models and Baseline Evaluations

The paper investigates the applicability and performance of several baseline models on the FigureQA dataset. These models range from a simple text-only model, which serves as a sanity check against biases, to more complex architectures such as the Convolutional Neural Network (CNN) coupled with Long Short-Term Memory (LSTM) networks and the Relation Network (RN). The RN, noted for tasks requiring relational reasoning, emerged as the strongest performer with an accuracy of approximately 72.40% on the alternated color scheme test set. This highlights the intricacy of the task at hand and sets a benchmark for future advancements.

The alternation of color schemes between training and testing ensures that models do not overly rely on color patterns, emphasizing generalized learning over memorization. The results indicate a substantial challenge for current machine learning models, as human performance significantly exceeds the models' accuracy, showcasing a gap that demands innovation in algorithm design.

Implications and Future Directions

The introduction of FigureQA is a substantive contribution to the ongoing exploration of visual reasoning within AI. From a practical standpoint, improving the comprehension of figures could greatly enhance computational assistance in fields that heavily rely on data visualization, such as scientific research, data journalism, and business analytics. Theoretical advancements derived from FigureQA could refine our understanding of visual perception in AI and contribute to developing systems with more nuanced and human-like cognitive abilities in visual contexts.

Going forward, the potential for models trained on FigureQA to generalize and transfer their capabilities to real-world figure understanding remains a compelling direction for research. Furthermore, iterative extensions of the dataset, incorporating larger sets of question templates or natural-language questions, could expand the task complexity, fostering continuous advancements in AI-driven visual reasoning.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (6)
  1. Samira Ebrahimi Kahou (50 papers)
  2. Vincent Michalski (18 papers)
  3. Adam Atkinson (5 papers)
  4. Akos Kadar (16 papers)
  5. Adam Trischler (50 papers)
  6. Yoshua Bengio (601 papers)
Citations (255)