Reasoning with RAGged events: RAG-Enhanced Event Knowledge Base Construction and reasoning with proof-assistants (2506.07042v2)

Published 8 Jun 2025 in cs.CL

Abstract: Extracting structured computational representations of historical events from narrative text remains computationally expensive when constructed manually. While RDF/OWL reasoners enable graph-based reasoning, they are limited to fragments of first-order logic, preventing deeper temporal and semantic analysis. This paper addresses both challenges by developing automatic historical event extraction models using multiple LLMs (GPT-4, Claude, Llama 3.2) with three enhancement strategies: pure base generation, knowledge graph enhancement, and Retrieval-Augmented Generation (RAG). We conducted comprehensive evaluations using historical texts from Thucydides. Our findings reveal that enhancement strategies optimize different performance dimensions rather than providing universal improvements. For coverage and historical breadth, base generation achieves optimal performance with Claude and GPT-4 extracting comprehensive events. However, for precision, RAG enhancement improves coordinate accuracy and metadata completeness. Model architecture fundamentally determines enhancement sensitivity: larger models demonstrate robust baseline performance with incremental RAG improvements, while Llama 3.2 shows extreme variance from competitive performance to complete failure. We then developed an automated translation pipeline converting extracted RDF representations into Coq proof assistant specifications, enabling higher-order reasoning beyond RDF capabilities including multi-step causal verification, temporal arithmetic with BC dates, and formal proofs about historical causation. The Coq formalization validates that RAG-discovered event types represent legitimate domain-specific semantic structures rather than ontological violations.

Summary

The paper introduces a two-phase pipeline that extracts events from historical narratives and formalizes them using Coq proof assistant.
It demonstrates that pure base generation outperforms RAG-enhanced approaches for stronger models while weaker models need external scaffolding.
The work overcomes RDF/OWL limitations by converting extracted RDF representations into Coq, enabling multi-step causal reasoning and formal verification.

Reasoning with RAGged Events: RAG-Enhanced Event Knowledge Base Construction and Reasoning with Proof-Assistants

This paper (2506.07042) addresses the challenges of extracting structured representations of historical events from narrative text and reasoning about them. It introduces an approach that leverages LLMs, enhanced with knowledge graph information and RAG, to automatically construct historical event knowledge bases. The extracted RDF representations are then translated into Coq proof assistant specifications, enabling higher-order reasoning.

Methodology and Experimental Setup

The authors implement a two-phase pipeline. Phase 1 focuses on semantic event extraction from unstructured historical narratives, encompassing event boundary detection, agent identification, geographical entity resolution, temporal expression normalization, outcome extraction, and RDF knowledge graph construction. Phase 2 involves RDF-to-Coq inductive type conversion, higher-order temporal logic implementation, causal inference framework integration, and proof-assistant compatibility for formal verification. The methodology employs historical texts from Thucydides' History of the Peloponnesian War as a controlled domain. Three LLMs are used: GPT-4o, Claude-3.5 Sonnet, and Llama 3.2, each with three enhancement strategies: base generation, knowledge graph enhancement, and RAG. External knowledge retrieval includes Wikidata, DBpedia SPARQL endpoints, and the ConceptNet API.

Key Findings and the Inverse Calibration Principle

The paper reveals that enhancement strategies optimize different performance dimensions rather than providing universal improvements. An "inverse calibration principle" is observed, where enhancement effectiveness inversely correlates with model capability. Stronger models like GPT-4o and Claude-3.5 achieve superior performance through pure base generation, while weaker models like Llama 3.2 require external scaffolding but exhibit extreme sensitivity to implementation quality. Base generation excels in comprehensive historical coverage, while RAG enhancement improves coordinate accuracy and metadata completeness, trading breadth for technical precision. The Coq formalization validates that RAG-discovered event types represent legitimate domain-specific semantic structures.

Limitations of RDF/OWL Systems and the Coq Translation

The authors highlight the computational limitations of RDF/OWL systems, which are constrained to decidable subsets of first-order logic, limiting their ability to express and verify complex historical relationships. To overcome these limitations, they develop an automated translation pipeline that converts extracted RDF/Turtle representations into formal specifications for the Coq proof assistant. This translation unlocks analytical capabilities impossible within RDF frameworks, such as multi-step causal reasoning and formal verification of historical propositions.

Implications and Future Directions

The paper challenges the assumption that more comprehensive retrieval necessarily leads to better performance, demonstrating that optimal RAG design requires careful evaluation of whether external enhancement is necessary. The discovery that pure inferential generation achieves superior overall performance compared to enhanced RAG configurations has significant implications for the field. Future work should explore generalization across domains and historical periods, investigate hybrid approaches, and develop accessible interfaces for formal verification.

Reasoning with RAGged events: RAG-Enhanced Event Knowledge Base Construction and reasoning with proof-assistants (2506.07042v2)

Summary

Reasoning with RAGged Events: RAG-Enhanced Event Knowledge Base Construction and Reasoning with Proof-Assistants

Methodology and Experimental Setup

Key Findings and the Inverse Calibration Principle

Limitations of RDF/OWL Systems and the Coq Translation

Implications and Future Directions

Follow-up Questions

Authors (1)

Reasoning with RAGged events: RAG-Enhanced Event Knowledge Base Construction and reasoning with proof-assistants (2506.07042v2)

Summary

Reasoning with RAGged Events: RAG-Enhanced Event Knowledge Base Construction and Reasoning with Proof-Assistants

Methodology and Experimental Setup

Key Findings and the Inverse Calibration Principle

Limitations of RDF/OWL Systems and the Coq Translation

Implications and Future Directions

Follow-up Questions

Related Papers

Authors (1)