MedRule-KG: Rule-Enforced LLMs

Updated 24 November 2025

MedRule-KG is a framework that integrates a compact, auditable biomedical knowledge graph to enforce mathematical and domain rules within LLM outputs.
It employs real-time symbolic fact retrieval and constraint-aware decoding to adjust token probabilities and prevent rule violations.
Empirical evaluations demonstrate 100% rule compliance and high performance in scientific reasoning tasks with low latency and scalable operation.

MedRule-KG is a knowledge-graph–driven framework for enforcing mathematically and biomedically valid outputs in LLMs without retraining or reliance on extensive tool augmentation. The system integrates a compact, auditable knowledge graph (KG), real-time symbolic fact retrieval and prompt structuring, constraint-aware generation control, and deterministic post-hoc verification. MedRule-KG is designed for settings such as scientific reasoning and early-stage drug discovery, where domain-specific rule violations can compromise reliability and safety (Su, 17 Nov 2025).

1. System Components and Functional Workflow

MedRule-KG is architected around three primary, tightly coupled modules:

1. Typed Compact Knowledge Graph: Nodes represent compounds (small-molecule drugs with categorical tags such as is_substrate, is_inhibitor, prolongs_qt), enzymes (e.g., CYP3A4, CYP2D6), and risk factors (e.g., QT-prolongation, hepatotoxicity). Edges represent relations such as metabolized_by (compound→enzyme), inhibits (compound→enzyme), and contraindicated_with (compound↔compound). KG triples are stored as (head, relation, tail, confidence ω∈[0,1]), where ω is curated from sources like FDA tables, DrugBank, or probabilistic literature mining. Penalty weights λ_i for each rule in decoding are scaled by ω_i to reflect fact reliability.

Prompt Construction and Fact Infusion: On receiving a query (e.g., “Assess co-administration of Drug A and Drug B”), a prompt builder retrieves a set C(x) of top-k=5–10 relevant KG triples. Retrieval ranks triples by φ_r(h,t)+s_text(h,t), combining a translational energy (φ_r) and a string-similarity score (s_text). Retrieved triples are serialized as a mini-table preceding the LLM prompt, e.g.:
1 2 3 4 5 6
KG facts: 1) (A, inhibits, CYP3A4; ω=0.95) 2) (B, metabolized_by, CYP3A4; ω=0.90) 3) (A, prolongs_qt, –; ω=0.80) ... Question: Are A and B safe to co-administer?
Constraint-Aware Decoding Controller: The model adjusts the next-token probability at each decoding step to penalize partial prefix violations:

$p'(y_t) \propto p(y_t) \cdot \exp\left[-\sum_i \lambda_i \cdot (1-g_{r_i}(y_{1:t}))\right]$

where $g_{r_i}(y_{1:t}) \in [0,1]$ is a differentiable rule satisfaction score and λ_i reflects both rule importance and KG confidence.

Deterministic Verifier: Post-generation, the verifier normalizes entity mentions using synonym dictionaries and evaluates each binary rule predicate $g_{r_i}(\hat{y})$ . Any rule violation triggers a minimal edit $\Delta y$ to restore satisfaction:

$\min \|\Delta y\| \quad \text{subject to} \quad g_{r_i}(\hat{y} + \Delta y) = 1 \ \forall i$

Edits are typically caveats, dosage revisions, or recommended substitutions. The verifier operates with latency $<5$ ms per instance ( $O(M \cdot |R|)$ for $M \leq 10$ entities and $|R| \leq 5$ rules).

2. Constrained Generation by Energy-Based Modeling

MedRule-KG formalizes rule-adherent text generation as MAP inference under an energy-based model:

$\hat{y} = \arg\max_{y \in \mathcal{Y}} \left[ \log p_\theta(y \mid x, K) - \sum_{r_i \in \mathcal{R}} \lambda_i \cdot 1[g_{r_i}(y)=0] \right]$

This is equivalent to minimizing the energy:

$\mathcal{E}(y; \theta, K) = -\log p_\theta(y|x, K) + \sum_i \lambda_i (1 - g_{r_i}(y))$

As direct optimization is intractable due to discrete indicators, a smooth surrogate is used:

$\mathcal{L}_{\text{soft}}(y) = -\log p_\theta(y|x,K) + \sum_i \lambda_i (1-g_{r_i}(y))$

Here $g_{r_i}(y)$ is a continuous score ( $\in[0,1]$ ). Gradient signals from $\mathcal{L}_{\text{soft}}$ reweight token probabilities during decoding, integrating symbolic prior penalties directly into the autoregressive prediction process (Su, 17 Nov 2025).

3. Domain Rule Sets and Enforcement Mechanisms

MedRule-KG encodes three primary families of biomedical and mathematical constraints:

Rule Family	Predicate Example	Scoring Function
Reaction Feasibility (R1)	Compounds A, B should not co-occur if A inhibits enzyme E and B is metabolized_by E	$g_{R1}(A,B) = 1 - 1[(A,\text{inhibits},E) \land (B,\text{metabolized\_by},E)]$
Metabolic Compatibility (R2)	Enzyme-based partial conflicts, e.g., enzyme induction vs. inhibition	$g_{R2}(A,B) = 1 - \frac{\|Enz(A) \cap Enz(B)\|}{\max(\|Enz(A)\|,\|Enz(B)\|)}$
Toxicity Safety (R3)	Shared risk factors such as QT-prolongation	$g_{R3}(A,B) = 1 - \max_{r \in Risks} [1[(A,\text{hasRisk},r) \land (B,\text{hasRisk},r)]]$

Logical rule composition is differentiable: $g_{A \land B} = g_A \cdot g_B$ , $g_{A \lor B} = \max(g_A, g_B)$ . The deterministic verifier performs rule checks using canonicalized entities against KG facts, flagging violations and triggering minimal post-hoc textual corrections or, if necessary, re-generation with stricter penalties.

4. Empirical Evaluation and Benchmarking

The evaluation task set comprises $N=90$ two-entity cases derived from the FDA and DrugBank, categorized as:

20 Reaction Feasibility (R1-only) cases
20 Metabolic Compatibility (R2-only) cases
30 “Both” (R1+R2 simultaneously)
20 “None” (no encoded constraints)

Key metrics include Exact Match (EM), Rule Violations (VR), and Safety–Accuracy Index (SAI $= \text{EM} \cdot (1-\text{VR})^2$ ). Statistical methods applied are Wilson-score confidence intervals, two-proportion z-tests, stratified Cochran–Mantel–Haenszel tests, and bootstrap resampling (10,000 replicates). Key results as reported:

System	EM	VR
Chain-of-Thought (CoT) Baseline	0.767 [0.678, 0.856]	0.233 [0.144, 0.322]
CoT + KG (no verifier)	0.900 [0.815, 0.959]	0.133 [0.041, 0.222]
KG + Verifier (MedRule-KG)	1.000 [1.000, 1.000]	0.000 [0.000, 0.000]

Zero violations are achieved in all rule categories when the full system is used. For “Both” constraint tasks, stratified EM increases from 0.60 in the baseline to 1.00. Performance improvements persist and uncertainties decrease as the task set size increases, consistent with uncertainty shrinking at the rate $1/\sqrt{N}$ (Su, 17 Nov 2025).

5. Scalability, Latency, and System Practicalities

MedRule-KG maintains practical latency and scalability:

Runtime: End-to-end latency on an A100 GPU is $<200$ ms per query ( $\approx140$ ms decoding, $\approx$ 40 ms prompt retrieval, $<5$ ms verification). This supports interactive rates of $\approx$ 5–7 queries per second.
Verifier Complexity: Post-hoc verification is $O(M \cdot |\mathcal{R}|)$ , where $M<10$ entities and $|\mathcal{R}|<5$ , yielding negligible overhead.
KG Characteristics: The KG is compact, typically $<5$ MB, supporting rapid retrieval. Prompt length saturates efficacy at 5–7 facts, with additional facts causing attention dilution.
Coverage Limitations: Rare enzymes or risk factors may be absent due to KG size. A soft-verifier variant allows trading strictness for speed, resulting in $\approx8\%$ rule violations.

6. Limitations, Extensions, and Domain Adaptability

Coverage is inherently constrained by KG completeness—extremely rare entities may be omitted, affecting recall for edge cases. The “soft-verifier” variant provides faster operation but reintroduces nontrivial violation rates. Possible avenues for extension include augmenting the KG with hierarchical ontologies, expanding rule families to address more complex scientific sub-domains, or leveraging minimal program synthesis for advanced algebraic reasoning.

MedRule-KG’s decoupled, interpretable design—using a lightweight, symbolic KG scaffold, soft control during generation, and hard post-hoc verification—enables reliable, high-accuracy scientific reasoning without model retraining or elaborate toolchains. The framework is suitable for any domain mandating hard rule compliance and scales to interactive, real-time scientific and engineering assistant scenarios (Su, 17 Nov 2025).

Markdown Report Issue Upgrade to Chat

References (1)

MedRule-KG: A Knowledge-Graph--Steered Scaffold for Reliable Mathematical and Biomedical Reasoning (2025)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to MedRule-KG.