ChartKG: Knowledge Graph for Chart Analysis
- ChartKG is a knowledge-graph-based formalism that integrates chart structure, semantics, and visual encodings for comprehensive chart analysis.
- The framework employs a multi-stage pipeline with chart classification using ResNet50, object detection with YOLOv5, and OCR for extracting chart elements, achieving high accuracy.
- ChartKG supports semantic retrieval and visual question answering by mapping visual marks, data variables, and insights in a unified, query-friendly graph.
ChartKG is a knowledge-graph-based formalism for representing the structure, semantics, and visual encodings of information graphics such as bar, line, pie, and scatter charts. Developed to address the limitations of prior chart-to-data extraction methods—which often capture only tabular readings while losing visual encodings and semantic patterns—ChartKG encodes rich relationships between graphical marks, underlying data variables, and inferred visual insights in a unified, query-friendly graph format. The framework defines both an ontology and an end-to-end chart image parsing pipeline, enabling downstream applications such as semantic retrieval and visual question answering to exploit both the semantics and visual logic of chart images (Zhou et al., 2024).
1. Formal Structure of ChartKG
A ChartKG is a labeled, directed multigraph with a node set partitioned into semantic and visual ontological classes:
- : visual elements (e.g., bars, slices, points, axes)
- : visual element property values (e.g., "height = 50 px", "color = blue", "angle = 45\circ")
- : data variables (e.g., "Year", "Region")
- : data variable values (e.g., "2010", "Chrome", "ArabRegion")
- : visual insights (e.g., "increasingTrend", "dominantSlice", "outlier")
Edges are labeled by a relation set :
- : visual property correspondence, 0
- 1: data variable correspondence, 2
- 3: visual encoding mapping, 4
- 5: visual insight correspondence, 6
This schema permits explicit representation of (i) how graphical marks encode data variables and values; (ii) mapping between visual attributes and semantics (e.g., bar color encoding “Region”); and (iii) detected visual insights. The entire KG can be tensorized as a 3-way adjacency tensor 7 (Zhou et al., 2024).
2. Chart Image to KG Conversion Pipeline
The pipeline to extract ChartKG from a chart image consists of three primary stages:
- Chart Classification: A ResNet50, pretrained on ImageNet and fine-tuned (Adam, lr=5e-4), predicts chart type (bar, line, pie, scatter), achieving 87.2% test accuracy on a balanced dataset.
- Chart Parsing:
- Object Detection: YOLOv5 models (one per chart type) detect bounding boxes and types of constituent elements (title, axis titles/labels, legend regions, graphical marks) with per-class [email protected]:0.95 as high as 0.932 for bars and 0.900 for pie slices; line element detection is less accurate ([email protected]:0.95 = 0.645).
- Optical Character Recognition: For each detected text box, regions are rescaled and parsed using Tesseract, with text recognition accuracy >70% for most categories and >90% for axis titles and labels.
- Graphical-Mark Parsing: Rule-based analysis matches visual mark properties to semantic data variables. For example, bar heights and colors are quantified, line segments traced and colored, pie slice angles and hues computed, and scatter point coordinates extracted.
- KG Construction: Linking rules associate detected visual properties and data variables according to the aforementioned relations, and computed insights are added as nodes and triples when appropriate (Zhou et al., 2024).
3. Illustrative ChartKG Construction: Case Studies
The expressiveness of ChartKG is demonstrated through diverse example analyses:
- Bar Chart: For “Adjusted Net Savings by Region in 2010,” bars are nodes linked to properties (e.g., height, color), data variables (“Region,” “Year,” “Savings”), and visual insight (“dominantBar”). Triples encode property correspondences (bar to height), encodings (height to Savings), and insights (“Region” linked to “dominantBar”).
- Line Chart: For “Education Costs in India vs. Ukraine (2006–2008),” line elements link via color to country, segments to year/cost, and visual insights capture “positiveCorrelation,” “upwardTrend.”
- Pie Chart: Slices are nodes with angle and color encoding, connected to data variables (“Browser,” “Share”) and insights (“dominantSlice”).
- Scatter Plot: Each point stores position and color attributes mapped to semantic variables, with insight nodes for “clustered” and “outlier” patterns (Zhou et al., 2024).
4. Quantitative Performance Metrics
Performance evaluation for ChartKG centers on two core components:
- Object Detection: On the held-out test set, precision for key graphical classes is 0.999 (title), 0.999 (x-axis title), 0.999 (bar), 0.981 (line), 0.999 (pie), 0.995 (scatter point); [email protected]:0.95 ranges from 0.645 (line) to 0.932 (bar).
- Optical Character Recognition: Text extraction attains >70% accuracy overall, typically >90% for axis titles and labels.
These metrics substantiate that ChartKG’s pipeline reliably parses visual and textual elements for KG construction (Zhou et al., 2024).
5. Downstream Applications: Retrieval and Question Answering
ChartKG supports two principal application domains:
- Semantic-Aware Chart Retrieval: Charts are indexed by their KGs. User queries specify chart type, data variables/values, relation patterns (e.g., requesting a certain color–country encoding), and target visual insights. A user study reported >4.0/5 satisfaction for insight-driven queries and sub-2-second retrieval latency, compared with 2.3/5 for a keyword-only baseline.
- Chart VQA: Two paradigms are implemented. (A) Rule-based traversal of the KG delivers structured QA (data comparison, encoding queries, insight inference) with 92% accuracy and 0.6 sec/question. (B) Text-to-text modeling (KG-T5) yields 85% accuracy (3.6 sec/question), outperforming flat-table baselines (ChartQA: 78% accuracy, 3.4 sec/question) (Zhou et al., 2024).
6. Integration of Chart Parsing Systems with ChartKG
ChartKG is designed to be the unified target for diverse chart extraction systems, including vision-LLMs. "OneChart" (Chen et al., 2024), a numerically robust structural extractor using a vision-language backbone augmented by an auxiliary numerical token and self-consistency-based confidence scoring, emits Python-dict structured outputs. These can be systematically mapped into ChartKG's ontology by (i) representing the chart as a node, (ii) assigning literal-valued edges for metadata (title, axis labels), (iii) representing each data point as a node with property and value edges, (iv) associating system-calculated confidence scores as attributes, and (v) storing both numeric values and extraction confidences. This permits comparative reasoning over structure, numeric content, and extraction reliability within the KG paradigm (Chen et al., 2024).
7. Prospective Directions and Extensions
Prospective research efforts include extending ChartKG to accommodate compound and interactive visualizations, refining raw-data extraction mechanisms for enhanced numeric reasoning, and developing interfaces for human-AI co-editing of extracted KGs. A plausible implication is the potential to support comprehensive chart analytics, provenance tracking, and more complex reasoning tasks over visually encoded data (Zhou et al., 2024).