Papers
Topics
Authors
Recent
Search
2000 character limit reached

GraphMERT: Neurosymbolic KG Extraction

Updated 19 March 2026
  • GraphMERT is a neurosymbolic framework that unifies neural abstraction with symbolic reasoning to reliably extract domain-specific knowledge graphs from unstructured text.
  • It leverages a RoBERTa-style encoder with hierarchical graph attention and graph-distance aware mechanisms to enhance triple extraction accuracy and traceability.
  • The model outperforms LLM baselines by achieving higher factuality and ontological validity, significantly boosting downstream biomedical QA performance.

GraphMERT is a compact neurosymbolic framework designed for the efficient and scalable extraction of reliable, domain-specific knowledge graphs (KGs) from unstructured text. It integrates neural and symbolic paradigms to address longstanding challenges of knowledge graph induction, unifying the generalization of neural models with the explicitness and interpretability of symbolic representations. The architecture, described in detail below, achieves state-of-the-art KG quality, especially in high-stakes settings such as biomedical informatics, by delivering triples that are both provably factual and ontologically valid, significantly outperforming LLM baselines in key metrics and end-task utility (Belova et al., 10 Oct 2025).

1. Motivation and Problem Setting

Knowledge graphs are the gold-standard structure for explicit semantic knowledge representation, frequently leveraged in domains where abstraction, auditability, and verifiable reasoning are paramount. Neurosymbolic approaches, combining neural computation for abstraction and symbolic methods for explicit reasoning, have been proposed for decades but have not achieved mainstream, scalable adoption due to: (1) inefficiencies and brittleness in rule- or embedding-based KG induction, (2) the implicitness, hallucination risk, prompt sensitivity, and provenance opacity of LLM-generated triples, and (3) lack of scalable, factual, and valid graph distillation protocols.

GraphMERT situates itself at the intersection of these requirements, targeting KGs that deliver:

  • Factuality: Each triple is grounded in and traceable to a specific sentence or abstract.
  • Validity: Each relation and entity conforms with a domain-specific ontology (e.g., SNOMED CT, UMLS).

This dual criterion ensures both high reliability and domain appropriateness, directly addressing failings of existing LLM and rule-based extraction methods (Belova et al., 10 Oct 2025).

2. Architectural Components

2.1 Encoder and Input Representation

GraphMERT, denoted as F(G;θ)\mathcal{F}(G;\theta), adopts a RoBERTa-style encoder-only architecture with approximately 80 million parameters. Inputs are encoded as “leafy chain graphs”—fixed-size graphs in which root nodes represent textual token sequences, and sparse leaf nodes store injected seed KG triples and their relations.

2.2 Semantic Embedding via Hierarchical Graph Attention Network (H-GAT)

For each seed triple h,r,t\langle h, r, t \rangle (with entities hh and tt, and relation rr), GraphMERT applies an H-GAT module that fuses the embedding of each tail token tit_i with all head token embeddings {hj}\{h_j\} and a learnable relation embedding WrW_r. The propagation utilizes a LeakyReLU attention mechanism: eij(r)=LeakyReLU(ar[WrtiWrhj]),αij(r)=exp(eij(r))kexp(eik(r))e_{ij}^{(r)} = \mathrm{LeakyReLU}\bigl(a_r^\top[\,W_r t_i \,\|\, W_r h_j]\bigr), \quad \alpha_{ij}^{(r)} = \frac{\exp(e_{ij}^{(r)})}{\sum_k \exp(e_{ik}^{(r)})}

ti=ti+jαij(r)Wrhjt_i' = t_i + \sum_j \alpha_{ij}^{(r)} W_r h_j

This ensures each masked leaf encodes relation-specific semantic information within the transformer embedding layer.

2.3 Graph-Distance-Aware Attention

Attention weights h,r,t\langle h, r, t \rangle0 are augmented with an exponential decay mask, parameterized by the shortest-path distance h,r,t\langle h, r, t \rangle1 within the chain graph: h,r,t\langle h, r, t \rangle2

h,r,t\langle h, r, t \rangle3

where h,r,t\langle h, r, t \rangle4 is a learnable parameter. This bias enforces locality such that prediction for masked leaves is informed by both text and nearby semantic structures, aligning neural attention with the symbolic graph topology.

3. Neurosymbolic KG Distillation Pipeline

3.1 Seed KG Injection and Context-Driven Selection

The distillation process begins with a high-quality, domain-specific seed KG (e.g., 28 relations and h,r,t\langle h, r, t \rangle5 triples from SNOMED CT and Gene Ontology). Entity linking leverages SapBERT embeddings and character-level n-gram Jaccard filtering to match textual entities to UMLS concepts. Contextual triple selection ranks relevant seed triples by cosine similarity (via text-embedding-004) and injects a single diverse triple per head entity, suppressing overrepresentation from common relations (e.g., “isa”).

3.2 Joint Training with Masked-Language and Masked-Node Modeling

GraphMERT optimizes a composite loss across both textual spans and masked graph nodes: h,r,t\langle h, r, t \rangle6 where h,r,t\langle h, r, t \rangle7 are masked text spans, h,r,t\langle h, r, t \rangle8 are masked leaf (tail) nodes, and h,r,t\langle h, r, t \rangle9 is the span-boundary loss from SpanBERT for alignment. Dropout on relation embeddings is applied to prevent overfitting due to seed triple scarcity.

3.3 Triple Extraction at Inference

KG extraction from raw text involves:

  1. Identification of head-span hh0 and relation hh1 using a helper LLM constrained by the set of seed relations.
  2. Masking the associated leaf node and prediction of top-hh2 tokens for the tail span hh3.
  3. Assembly of tail phrases by the helper LLM, filtered to ensure token set membership.
  4. Filtering out any candidate triple whose cosine similarity to its source sentence falls below a threshold hh4.
  5. Deduplication to form the final KG, with each triple linked explicitly to its source sentence for provenance (Belova et al., 10 Oct 2025).

4. Evaluation Metrics and KG Quality

4.1 FActScore* (Factuality)

FActScore* measures the proportion of triples hh5 in the extracted KG hh6 for which hh7 is logically supported by the corresponding source text hh8 and is well-formed: hh9

tt0

4.2 ValidityScore (Ontology Consistency)

ValidityScore assesses whether each triple is consistent with the domain ontology, using an LLM judge prompt for schema validation: tt1

tt2

These metrics are complemented by end-task evaluations, such as GraphRAG-based question-answering accuracy that indirectly captures global KG coherence and coverage.

5. Experimental Setup and Results

GraphMERT was evaluated in the biomedical domain for diabetes-related concept extraction using PubMed abstracts:

  • Training Data: 350k abstracts (124.7M tokens); evaluation on 39k abstracts (13.9M tokens).
  • Seed KG: tt3 triples from UMLS SNOMED CT and Gene Ontology.
  • Model Configuration: 12 layers, hidden size 512, 8 heads, 79.7M parameters. Training utilized 4×H100 GPUs over 25 epochs, with batch size 128, tt4, relation-dropout of 0.3.
  • Helper LLM: Qwen3-32B (8-bit quantized) for head discovery, relation typing, and tail assembly.

After extraction and filtering (tt5), GraphMERT produced 109,293 unique triples across 28 relations. In comparison, the Qwen3-32B LLM baseline generated 272,346 triples but with markedly lower precision.

Triple-level evaluation demonstrates:

Model FActScore* (%) ValidityScore (%) "No" Rate (%)
GraphMERT 69.8 68.8 10.8
Qwen3-32B LLM Baseline 40.2 43.0 31.4

GraphMERT’s KG demonstrates higher factuality and ontological validity. In downstream evaluation on ICD-Bench (GraphRAG QA, endocrinology subset, 69 questions): LLM baseline KG achieved 50.2% accuracy, the seed KG 53.1%, and GraphMERT KG 59.4%. On public medical QA benchmarks, GraphMERT KG yields up to +3.7% accuracy improvement over baseline (Belova et al., 10 Oct 2025).

6. Impact, Limitations, and Future Directions

GraphMERT advances neurosymbolic AI by fusing neural generalization with symbolic transparency in a computationally efficient paradigm. Its design enables:

  • Direct, end-to-end traceability of every extracted triple to its source text, enabling explicit provenance and auditability.
  • Superior factuality and validity relative to LLM-based and rule-based baselines.
  • Scalability to large text corpora with a compact encoder architecture, reducing computational cost compared to massive LLM retraining.

Limitations include dependency on a curated seed KG (tt6100–1,000 samples/relation), reliance on a helper LLM for token assembly (which can yield incomplete tail entities), and a fixed relation set, necessitating retraining for relation set expansion. Planned future work targets removing the helper LLM via direct span decoding in semantic space, developing fully neural graph decoders, refining graph-level QA and retrieval metrics, and adapting the approach to other knowledge-rich domains (e.g., law, finance) to support domain-specific intelligent systems (Belova et al., 10 Oct 2025).

7. Significance Within Neurosymbolic AI

GraphMERT constitutes the first efficient and scalable neurosymbolic model for distilling reliable, domain-specific KGs from unstructured text. By combining encoder-based neural abstraction with explicit, ontology-grounded symbolic triples, it bridges a longstanding gap between neural and symbolic AI. Its contributions—especially regarding provenance, auditability, and domain-validity—are highly salient for informatics disciplines where interpretability and rigor are indispensable, marking a substantive development in practical neurosymbolic AI (Belova et al., 10 Oct 2025).

Definition Search Book Streamline Icon: https://streamlinehq.com
References (1)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to GraphMERT.