NeSyGeo: A Neuro-Symbolic Framework for Multimodal Geometric Reasoning Data Generation (2505.17121v2)

Published 21 May 2025 in cs.CL and cs.AI

Abstract: Obtaining large-scale, high-quality reasoning data is crucial for improving the geometric reasoning capabilities of multi-modal LLMs (MLLMs). However, existing data generation methods, whether based on predefined tem plates or constrained symbolic provers, inevitably face diversity and numerical generalization limitations. To address these limitations, we propose NeSyGeo, a novel neuro-symbolic framework for generating geometric reasoning data. First, we propose a domain-specific language grounded in the entity-attributes-relations paradigm to comprehensively represent all components of plane geometry, along with generative actions defined within this symbolic space. We then design a symbolic-visual-text pipeline that synthesizes symbolic sequences, maps them to visual and textual representations and generates reasoning path with reverse search and forward validation. Based on this framework, we construct NeSyGeo CoT and NeSyGeo-Caption datasets, containing 100k samples, and release a new benchmark NeSyGeo-Test for evaluating geometric reasoning abilities in MLLMs. Experiments demonstrate that the proposal significantly and consistently improves the performance of multiple MLLMs under both reinforcement and supervised fine-tuning. With only 4k samples and two epochs of reinforcement fine-tuning, base models achieve improvements of up to +15.8% on MathVision, +8.4% on MathVerse, and +7.3% on GeoQA. Notably, a 4B model can be improved to outperform an 8B model from the same series on geometric reasoning tasks.s

Summary

The paper introduces a neuro-symbolic approach (NeSyGeo) that fuses symbolic constructs with visual and textual modalities to augment geometric reasoning.
It employs Geo-DSL for defining plane geometry and uses a two-stage BFS and verification process to systematically generate and validate reasoning chains.
Empirical evaluations show performance gains up to +15.8% on MathVision, highlighting the framework's scalability and effectiveness in enhancing MLLMs.

Introduction to NeSyGeo Framework

The paper presents NeSyGeo, a neuro-symbolic framework intended for automated multimodal geometric reasoning data generation. NeSyGeo systematically addresses the limitations of existing geometric reasoning datasets by integrating symbolic constructs with visual and textual modalities. It utilizes a domain-specific symbolic language, Geo-DSL, to define plane geometry elements, facilitating diverse and scalable data synthesis.

Symbolic Language and Framework

Geo-DSL is structured upon the entity–attributes–relations paradigm, offering concise definitions that encompass primitive geometric constructs. This language captures a broad spectrum of plane geometry through succinct statements, ensuring comprehensive numerical and relational representations. The NeSyGeo pipeline then leverages these symbolic definitions to synthesize multimodal datasets comprising high-quality images and logical reasoning paths.

Model generation commences with symbolic sequenced actions within defined parametric bounds, systematically increased via probabilistic sampling. The framework then validates and converts this symbolic data into visual diagrams and natural language descriptions, ensuring orthogonality in modality information.

Figure 1: Performance comparison of different MLLMs and LLMs with and without image input in several geometry datasets. The minimal or negligible drops observed upon image removal in GeoQA and R-CoT raise concerns regarding the utilization of visual information for geometric reasoning.

Reasoning and Validation Mechanism

One unique aspect of NeSyGeo is its two-stage reasoning mechanism comprising a progressive reverse search and a forward validation process. This methodology exploits LLMs' capabilities for expanding reasoning states and ensures data correctness, effectively synthesizing diverse and valid reasoning chains without exhaustive searches.

The first stage employs a Breadth-First Search (BFS) accelerated by LLM-driven preferences, incrementally expanding the search frontier and forming candidate question-answer pairs. Subsequently, the Verifier rigorously validates the logical coherence of this generated data.

Synthesis Pipeline and Output

NeSyGeo adeptly transforms these symbolic constructs into visual and textual outputs, ensuring the fidelity and diversity of the synthesized data. The framework's emphasis on high-resolution, semantically annotated graphics addresses previous datasets' image quality issues. Furthermore, the explicit partitioning of information across modalities compels models to engage with both visual and textual inputs.

Figure 2: Comparison of NeSyGeo-CoT dataset with other Popular Geometry Datasets. Geometry-3K is a manually synthesized dataset, while the remaining approaches employ automatic generation techniques. Our dataset features high-quality reasoning chains and balanced distribution of information between images and text.

Empirical Evaluation

The synthesis framework not only generates aesthetically pleasing and correct reasoning paths but also substantially enhances model performance. Empirical results verify NeSyGeo’s effectiveness, with notable performance improvements observed across multiple MLLMs and benchmarks.

With reinforcement learning and supervised fine-tuning approaches, NeSyGeo’s datasets have demonstrated significant gains: up to +15.8% on MathVision, +8.4% on MathVerse, and +7.3% on GeoQA. This underlines the dataset's capability to enhance geometric reasoning beyond pre-existing models in a scalable and effective manner.

Figure 3: Model performance on Mathverse as the RL training steps increase. With InternVL2.5-4B as the base model, metrics consistently improve throughout training.

Conclusion

In conclusion, NeSyGeo introduces a robust, scalable framework for generating multimodal geometric reasoning data, leveraging advanced neuro-symbolic integration to address existing dataset limitations. Its effective synthesis approach significantly improves model reasoning capabilities across visual and textual modalities, setting a new benchmark for geometric problem-solving in MLLMs.

NeSyGeo's experimental results highlight the framework's efficiency in advancing the geometric reasoning abilities of MLLMs, offering promising directions for future research in automated data generation and model training paradigms.