Explainable Rule Application via Structured Prompting: A Neural-Symbolic Approach (2506.16335v1)

Published 19 Jun 2025 in cs.AI and cs.CL

Abstract: LLMs excel in complex reasoning tasks but struggle with consistent rule application, exception handling, and explainability, particularly in domains like legal analysis that require both natural language understanding and precise logical inference. This paper introduces a structured prompting framework that decomposes reasoning into three verifiable steps: entity identification, property extraction, and symbolic rule application. By integrating neural and symbolic approaches, our method leverages LLMs' interpretive flexibility while ensuring logical consistency through formal verification. The framework externalizes task definitions, enabling domain experts to refine logical structures without altering the architecture. Evaluated on the LegalBench hearsay determination task, our approach significantly outperformed baselines, with OpenAI o-family models showing substantial improvements - o1 achieving an F1 score of 0.929 and o3-mini reaching 0.867 using structured decomposition with complementary predicates, compared to their few-shot baselines of 0.714 and 0.74 respectively. This hybrid neural-symbolic system offers a promising pathway for transparent and consistent rule-based reasoning, suggesting potential for explainable AI applications in structured legal reasoning tasks.

PDF Abstract

Explainable Rule Application via Structured Prompting: A Neural-Symbolic Approach

"Explainable Rule Application via Structured Prompting: A Neural-Symbolic Approach" (Sadowski et al., 19 Jun 2025 ) presents a neural-symbolic framework specifically designed to address the persistent challenges encountered by LLMs in rule-based reasoning scenarios—namely, their tendency toward inconsistent rule application, suboptimal exception handling, and opaqueness in reasoning steps. This work targets domains such as legal analysis, where both interpretative flexibility and deterministic logical verification are critically needed.

Structured Prompting Framework

The proposed system is grounded in a structured decomposition methodology that externalizes task definitions, enabling domain experts to precisely define logical structures—terms, predicates, and rules—without necessitating modifications to the core system architecture. The reasoning process is architected as a transparent pipeline comprising three explicit steps:

Entity (Term) Identification: The LLM identifies relevant spans in the text as domain-specific entities, justified according to externally defined term criteria.
Predicate (Property) Extraction: The LLM assigns predicates (relationships/properties) to extracted entities, again with stepwise justification.
Symbolic Rule Application (Verification): Extracted predicates are composed into formal logic expressions. An SMT solver or similar tool deterministically checks rule satisfaction, enabling formal verification of rule application.

This separation not only mirrors best practices from both symbolic AI and modern prompting but also mitigates typical failure modes where LLMs conflate linguistic interpretation with rigid logical structure, or where they fail to provide a clear audit trail for their decisions.

Case Study and Design Rationale

The framework is evaluated on the LegalBench "hearsay determination" task, requiring the identification of statements fitting the legal definition of hearsay under the Federal Rules of Evidence. This domain is particularly suitable due to the interplay between nuanced text interpretation and the strictures of statutory logic.

A notable innovation highlighted by the paper is the use of complementary predicate pairs (e.g., IsInCourt vs. IsOutOfCourt). This approach compels the LLM to explicitly adjudicate between mutually exclusive properties, suppressing bias toward affirmative selections and offering empirical improvements in rule application precision with models amenable to stepwise reasoning.

Empirical Results

On the LegalBench hearsay test, the framework—when paired with task definitions including complementary predicates—achieves substantial performance improvements on OpenAI's "o-family" models: F1 scores of 0.929 (o1) and 0.867 (o3-mini), representing 12–14 percentage point gains over their respective few-shot baselines. These results are robust against ablations: direct (non-symbolic) prompting with identical predicate definitions closes some of the gap but never reaches the full-system performance, underlining the value of explicit decomposition and symbolic post-processing.

Other major model families (Anthropic Claude, Meta Llama, and DeepSeek) display mixed results, with some exhibiting decreased precision or recall when decomposition complexity is increased. Observed performance stratifies models according to their reasoning depth and tolerance to prompt complexity, reinforcing the need for model-specific strategy selection in practical deployments.

Explainability, Limitations, and Theoretical Implications

The explicit, three-phase decomposition yields clear inspection points for human review—particularly relevant for legal or high-stakes domains, where justification traceability is paramount. The system’s architecture readily maps intermediate LLM outputs (entity and predicate assignments) to final symbolic verdicts, facilitating error localization and the refinement of rule definitions by non-technical stakeholders.

However, several caveats are acknowledged. The necessity for rigorous upfront predicate definition may limit adaptability in domains with highly fluid or underdefined rule sets. Moreover, realized improvements are not uniform across all LLM families: architectural alignment with the structured prompting paradigm is a precondition for maximal benefit, as evidenced by divergent results on certain transformer models.

Theoretically, this work exemplifies an effective pattern for neural-symbolic integration within applied NLP. The methodological commitment to decomposability and formal verification points toward scalable, transparent AI reasoning systems, suitable for regulatory or adversarial settings. It also paves the way for further advances, such as:

Integration with higher-order logics (e.g., deontic or counterfactual operators) to support more nuanced legal/statutory interpretation.
Automated predicate discovery or adaptation, reducing reliance on manual formalization for task bootstrapping.
Multi-agent and adversarial workflows, where negotiation or iterative refinement of rules may more faithfully replicate real-world settings.

Future Directions

Research inspired by this approach should seek to generalize the framework to open-world and multi-label tasks, and to investigate adaptation mechanisms whereby LLMs themselves assist in co-evolving predicate definitions. Further empirical work is required to characterize the boundary between beneficial and detrimental prompt complexity across architectures. The demonstrated model-dependence advocates for adaptive or meta-prompting routines capable of calibrating decomposition detail to the capabilities of the underlying LLM.

Overall, this paper provides both a technically granular blueprint and a quantitatively substantiated rationale for hybrid architectures in explainable, rule-based language tasks. The findings support a broader thesis: carefully designed decomposition and external symbolic formalism are essential tools for deploying LLM-based systems in domains where explainability and procedural correctness are not negotiable.

PDF Markdown Bookmark Chat (Pro)

Authors (2)

Albert Sadowski (1 paper)
Jarosław A. Chudziak (5 papers)

Related Papers

Find Related Papers

YouTube

Show All Videos