- The paper introduces a neuro-symbolic approach (NeSyGeo) that fuses symbolic constructs with visual and textual modalities to augment geometric reasoning.
- It employs Geo-DSL for defining plane geometry and uses a two-stage BFS and verification process to systematically generate and validate reasoning chains.
- Empirical evaluations show performance gains up to +15.8% on MathVision, highlighting the framework's scalability and effectiveness in enhancing MLLMs.
Introduction to NeSyGeo Framework
The paper presents NeSyGeo, a neuro-symbolic framework intended for automated multimodal geometric reasoning data generation. NeSyGeo systematically addresses the limitations of existing geometric reasoning datasets by integrating symbolic constructs with visual and textual modalities. It utilizes a domain-specific symbolic language, Geo-DSL, to define plane geometry elements, facilitating diverse and scalable data synthesis.
Symbolic Language and Framework
Geo-DSL is structured upon the entity–attributes–relations paradigm, offering concise definitions that encompass primitive geometric constructs. This language captures a broad spectrum of plane geometry through succinct statements, ensuring comprehensive numerical and relational representations. The NeSyGeo pipeline then leverages these symbolic definitions to synthesize multimodal datasets comprising high-quality images and logical reasoning paths.
Model generation commences with symbolic sequenced actions within defined parametric bounds, systematically increased via probabilistic sampling. The framework then validates and converts this symbolic data into visual diagrams and natural language descriptions, ensuring orthogonality in modality information.
Figure 1: Performance comparison of different MLLMs and LLMs with and without image input in several geometry datasets. The minimal or negligible drops observed upon image removal in GeoQA and R-CoT raise concerns regarding the utilization of visual information for geometric reasoning.
Reasoning and Validation Mechanism
One unique aspect of NeSyGeo is its two-stage reasoning mechanism comprising a progressive reverse search and a forward validation process. This methodology exploits LLMs' capabilities for expanding reasoning states and ensures data correctness, effectively synthesizing diverse and valid reasoning chains without exhaustive searches.
The first stage employs a Breadth-First Search (BFS) accelerated by LLM-driven preferences, incrementally expanding the search frontier and forming candidate question-answer pairs. Subsequently, the Verifier rigorously validates the logical coherence of this generated data.
Synthesis Pipeline and Output
NeSyGeo adeptly transforms these symbolic constructs into visual and textual outputs, ensuring the fidelity and diversity of the synthesized data. The framework's emphasis on high-resolution, semantically annotated graphics addresses previous datasets' image quality issues. Furthermore, the explicit partitioning of information across modalities compels models to engage with both visual and textual inputs.
Figure 2: Comparison of NeSyGeo-CoT dataset with other Popular Geometry Datasets. Geometry-3K is a manually synthesized dataset, while the remaining approaches employ automatic generation techniques. Our dataset features high-quality reasoning chains and balanced distribution of information between images and text.
Empirical Evaluation
The synthesis framework not only generates aesthetically pleasing and correct reasoning paths but also substantially enhances model performance. Empirical results verify NeSyGeo’s effectiveness, with notable performance improvements observed across multiple MLLMs and benchmarks.
With reinforcement learning and supervised fine-tuning approaches, NeSyGeo’s datasets have demonstrated significant gains: up to +15.8% on MathVision, +8.4% on MathVerse, and +7.3% on GeoQA. This underlines the dataset's capability to enhance geometric reasoning beyond pre-existing models in a scalable and effective manner.
Figure 3: Model performance on Mathverse as the RL training steps increase. With InternVL2.5-4B as the base model, metrics consistently improve throughout training.
Conclusion
In conclusion, NeSyGeo introduces a robust, scalable framework for generating multimodal geometric reasoning data, leveraging advanced neuro-symbolic integration to address existing dataset limitations. Its effective synthesis approach significantly improves model reasoning capabilities across visual and textual modalities, setting a new benchmark for geometric problem-solving in MLLMs.
NeSyGeo's experimental results highlight the framework's efficiency in advancing the geometric reasoning abilities of MLLMs, offering promising directions for future research in automated data generation and model training paradigms.