Formal Description of Visualization (FDV)
- Formal Description of Visualization (FDV) is a structured, multi-part schema that encodes key chart properties such as layout, scales, data, and marks.
- It bridges human-authored visuals with automated pipelines, enabling systematic generation and refinement of multimodal reports.
- Empirical evaluations show FDV enhances report coherence and visualization quality, achieving notable improvements over current baselines.
The Formal Description of Visualization (FDV) refers to a structured textual representation of visualization charts, designed to enable LLMs to systematically understand, generate, and refine diverse, high-quality visualizations in concert with textual reports. FDV, as developed in the context of the Multimodal DeepResearcher framework (2506.02454), draws foundationally on the Grammar of Graphics paradigm and encodes all critical properties of a visualization into a machine- and human-readable multi-part schema. This formalism is intended to bridge the gap between professionally authored visualizations and automated agentic generation pipelines, addressing challenges in integrating informative charts within text-heavy documents.
1. Structured Representation: Components and Principles
FDV is constructed as a multi-section, hierarchical schema that captures the essential elements of a visualization across four domains:
- Layout (Part-A): Describes the spatial organization, including figure dimensions, subplot arrangement, backgrounds, margins, titles, subtitles, captions, and whitespace or composition attributes.
- Plotting Scale (Part-B): Details the use and mapping of scales and axes, specifying formatting, placement, scaling strategies, and logical linkage between data variables and visual variables (e.g., x/y axes, color, size).
- Data (Part-C): Enumerates all information content presented, including data tables, textual components (labels, legends, titles), and annotation that contribute to the chart’s interpretative clarity.
- Marks (Part-D): Specifies the visual primitives such as bars, points, lines, including font usage, alignment, color palette, annotation techniques (arrows, footnotes), and rules for mark overlap or interaction.
The entire FDV schema is structured for machine parsing. An illustrative template is given as:
1 2 3 4 5 6 |
{ "Part-A: Overall Layout": { ... }, "Part-B: Plotting Scale": { ... }, "Part-C: Data": { ... }, "Part-D: Marks": { ... } } |
This hierarchy allows the encoding of a wide diversity of chart types, spanning bar and line charts to dashboards and infographics, including complex or compositionally rich designs.
2. Integration within Agentic Multimodal Research Pipelines
Within the Multimodal DeepResearcher framework, FDV is fundamental to an end-to-end pipeline for generating multimodal (text + chart) reports:
- Researching Stage: The system composes knowledge and references through iterative search and reasoning, which is later visualized.
- Exemplar Report Textualization: Human- or expert-created visualizations are algorithmically converted into FDV. This process (see Algorithm 1 in the paper) extracts visual structure from reference artifacts, and textualizes them for learning.
- Planning: The outline and visual style guide for a new report are determined with FDV-encoded exemplars, ensuring consistent narrative flow and stylistic coherence.
- Multimodal Generation: The system produces the interleaved text and charts. For each FDV block in the report outline, code is generated targeting a visualization subsystem (such as D3.js). An actor-critic loop uses screenshots and execution feedback to iteratively refine and approve visual output before inclusion.
FDV therefore serves two roles: it is both the representational substrate for chart creation and the learning signal for in-context agentic planning.
3. Evaluation and Benchmarking
Empirical evaluation is conducted using the MultimodalReportBench protocol, which includes:
- A dataset of 100 real-world topics with paired textual and visual research requirements.
- Automatic ranking of generated reports using five metrics: informativeness/depth, coherence/organization, verifiability (including references within visuals), visualization quality, and cross-chart consistency.
- Human evaluations to qualitatively validate the results.
Experiments (notably with the Claude 3.7 Sonnet model) demonstrate that integrating FDV in planning and generation yields substantial improvements: Multimodal DeepResearcher achieves an 82% win rate against state-of-the-art baselines (2506.02454). Ablation studies confirm that omitting FDV-driven exemplar learning or refinement drastically reduces report quality, and diversity analysis shows that FDV enables a greater variety and complexity of charts.
4. Formalism, Automation, and Quality Control
FDV promotes high-fidelity, automatable chart generation via:
- In-context learning: Exposing LLMs to structured FDV representations from expert-authored examples.
- Universality: Supporting a wide array of visualization genres and layout types.
- Deterministic code generation: FDV can be directly compiled to code for visualization libraries without interpretive ambiguity.
- Refinement loop: The formal spec guides iterative corrective feedback, improving the visual, technical, and semantic quality of output.
This approach reduces hallucination, enforces information completeness and provenance, and allows modular agentic systems to orchestrate complex multimodal artifacts.
5. Applications and Broader Implications
The adoption of FDV in agentic frameworks such as Multimodal DeepResearcher enables practical automation and enhancement of:
- Business intelligence and policy reporting, where credibility and insight rely on tight integration of narrative and visual data.
- Automated generation of educational materials, scientific documentation, and media content.
- Open data and analytics platforms, where scalable, explainable, and customizable visual reporting is required.
The structured, universal nature of FDV paves the way for standardization of visualization representation, improved robustness and error detection in LLM-based pipelines, and more effective cross-tool and cross-domain interoperability.
6. Summary Table: FDV’s Roles in Multimodal DeepResearcher
Stage/Component | FDV Role or Benefit |
---|---|
Researching | Provides factual grounding for visuals and text |
Exemplar Textualization | Converts expert charts to FDV for LLM in-context learning |
Planning | Enables style, layout, and organization transfer via FDV |
Report Generation & Refinement | Guides code generation and iterative visual improvements |
Evaluation & Diversity | Facilitates automated and human assessment of quality/variety |
7. Impact and Future Directions
FDV establishes a foundation for more transparent, diverse, and verifiable automated visual communication. As structured representations such as FDV become embedded in agentic and LLM-powered content generation, they are likely to support future research into chart authentication, bias reduction, adaptation to new chart types, and community-driven standards for the formal textualization of visualizations.