Papers

Topics

Authors

Recent

View all

Detailed Answer

Quick Answer

Concise responses based on abstracts only

Detailed Answer

Well-researched responses based on abstracts and relevant paper content.

Custom Instructions Pro

Preferences or requirements that you'd like Emergent Mind to consider when generating responses

Gemini 2.5 Flash

Gemini 2.5 Flash 42 tok/s

Gemini 2.5 Pro 53 tok/s Pro

GPT-5 Medium 17 tok/s Pro

GPT-5 High 13 tok/s Pro

GPT-4o 101 tok/s Pro

Kimi K2 217 tok/s Pro

GPT OSS 120B 474 tok/s Pro

Claude Sonnet 4 36 tok/s Pro

2000 character limit reached

Enhancing the Patent Matching Capability of Large Language Models via the Memory Graph (2504.14845v1)

Published 21 Apr 2025 in cs.IR

Abstract: Intellectual Property (IP) management involves strategically protecting and utilizing intellectual assets to enhance organizational innovation, competitiveness, and value creation. Patent matching is a crucial task in intellectual property management, which facilitates the organization and utilization of patents. Existing models often rely on the emergent capabilities of LLMs and leverage them to identify related patents directly. However, these methods usually depend on matching keywords and overlook the hierarchical classification and categorical relationships of patents. In this paper, we propose MemGraph, a method that augments the patent matching capabilities of LLMs by incorporating a memory graph derived from their parametric memory. Specifically, MemGraph prompts LLMs to traverse their memory to identify relevant entities within patents, followed by attributing these entities to corresponding ontologies. After traversing the memory graph, we utilize extracted entities and ontologies to improve the capability of LLM in comprehending the semantics of patents. Experimental results on the PatentMatch dataset demonstrate the effectiveness of MemGraph, achieving a 17.68% performance improvement over baseline LLMs. The further analysis highlights the generalization ability of MemGraph across various LLMs, both in-domain and out-of-domain, and its capacity to enhance the internal reasoning processes of LLMs during patent matching. All data and codes are available at https://github.com/NEUIR/MemGraph.

Summary

The paper introduces MemGraph, a framework that leverages memory graph traversal to extract entities and ontologies for enhanced patent matching.
MemGraph integrates retrieval augmentation and generation guidance, achieving up to 17.68% improvement in matching accuracy and reducing model uncertainty.
Empirical evaluations across various LLM backbones demonstrate MemGraph’s robustness and scalability for applications in intellectual property management.

Enhancing Patent Matching in LLMs via Memory Graph Traversal

Introduction

Patent matching is a central task in intellectual property management, requiring the identification of semantically similar patents to support innovation tracking, infringement prevention, and prior art search. The complexity of patent documents—characterized by hierarchical classification, domain-specific terminology, and intricate semantic relationships—poses significant challenges for automated matching systems. While LLMs have demonstrated strong capabilities in semantic understanding, their performance in patent matching is often limited by vocabulary mismatch and insufficient exploitation of hierarchical patent ontologies. The paper introduces MemGraph, a framework that augments LLM-based patent matching by traversing a memory graph derived from the LLM's parametric memory, enabling the extraction of both entities and ontologies to guide retrieval and matching.

Figure 1: MemGraph integrates a memory graph into LLM-based patent matching, enabling comprehensive semantic understanding and accurate similarity assessment.

Methodology

Memory Graph Augmentation

MemGraph operates within a Retrieval-Augmented Generation (RAG) paradigm, enhancing both the retrieval and generation stages by leveraging two latent variables: $\mathcal{Z}_\text{IR}$ (entity-based query expansion) and $\mathcal{Z}_\text{Gen}$ (ontology-based reasoning guidance). The memory graph is constructed by prompting the LLM to traverse its parametric memory, extracting:

Entities ( $V^e$ ): Specific technical concepts, components, or methods central to the patent's innovation.
Ontologies ( $V^o$ ): Hierarchical categories and relationships, typically aligned with the International Patent Classification (IPC) system.

The traversal is performed via structured prompts, enabling the LLM to autoregressively decode relevant entities and ontologies for both query and candidate patents. These extracted elements are then used to expand the query for retrieval and to guide the reasoning process during matching.

Figure 2: The MemGraph pipeline, illustrating entity and ontology extraction via memory graph traversal and their integration into retrieval and matching.

Integration with RAG

Retrieval Enhancement: The query patent is expanded with extracted entities ( $\mathcal{Z}_\text{IR}$ ), improving the relevance of retrieved documents by focusing on domain-specific terminology.
Matching Enhancement: Ontologies ( $\mathcal{Z}_\text{Gen}$ ) for both query and candidate patents are incorporated into the generation context, enabling the LLM to reason over hierarchical relationships and reduce prediction uncertainty.

This dual augmentation addresses both vocabulary mismatch and semantic ambiguity, allowing the LLM to leverage both internal memory and external evidence more effectively.

Experimental Evaluation

Dataset and Baselines

Experiments are conducted on the PatentMatch dataset, comprising 1,000 patent matching questions across eight IPC sections and two languages. Baselines include vanilla LLMs, chain-of-thought (CoT) prompting, domain-specific fine-tuned LLMs (MoZi, PatentGPT), and standard RAG models.

Performance Analysis

MemGraph achieves a 17.68% improvement over baseline LLMs and a 10.85% improvement over vanilla RAG models in average accuracy. Notably, MemGraph demonstrates robust generalization across different LLM backbones (Llama-3.1-Instruct $_8B$ , Qwen2-Instruct $_7B$ , GLM-4-Chat $_9B$ , Qwen2.5-Instruct $_14B$ ), outperforming larger models with smaller, well-augmented architectures.

Ablation Studies

Ablation experiments isolate the contributions of $\mathcal{Z}_\text{IR}$ and $\mathcal{Z}_\text{Gen}$ :

Entity-based retrieval ( $\mathcal{Z}_\text{IR}$ ): Yields a 3.8% improvement over vanilla RAG, particularly in domains with specialized terminology.
Ontology-based reasoning ( $\mathcal{Z}_\text{Gen}$ ): Provides a 9.2% improvement, with the largest gains in categories requiring hierarchical semantic understanding.
Combined augmentation: Delivers the highest and most consistent performance across all IPC sections.

Utilization of Retrieved Patents

MemGraph's ontology guidance ( $\mathcal{Z}_\text{Gen}$ ) enables LLMs to resolve conflicts between internal memory and noisy external evidence, narrowing the performance gap in scenarios where retrieved patents are misleading. In "Miss-Choice" scenarios (where the correct patent is absent from retrieved evidence), MemGraph outperforms vanilla RAG by up to 9.3%, demonstrating superior robustness to retrieval noise.

Reasoning Process Analysis

MemGraph reduces model uncertainty (as measured by perplexity) and improves reasoning quality, with GPT-4o preferring MemGraph's chain-of-thought outputs in 61.1% of cases compared to 15.4% for vanilla LLMs and 23.5% for vanilla RAG. This indicates that memory graph traversal enhances both confidence and interpretability in patent matching decisions.

Figure 3: Evaluation results showing reduced perplexity and improved reasoning win rates for MemGraph compared to baseline models.

Case Study

A detailed case analysis reveals that MemGraph's entity and ontology extraction enables precise matching by focusing on domain-specific ingredients and hierarchical classification, whereas vanilla LLMs and RAG models are misled by procedural similarities and general terms. MemGraph's approach ensures that matching decisions are grounded in both technical detail and categorical relevance.

Implications and Future Directions

MemGraph demonstrates that explicit traversal of the LLM's parametric memory to extract entities and ontologies substantially improves patent matching performance, generalization, and reasoning quality. The framework is model-agnostic and can be integrated with various LLM architectures, suggesting broad applicability in other domains requiring hierarchical semantic understanding (e.g., legal document analysis, biomedical literature matching).

Theoretical implications include the validation of memory graph traversal as a mechanism for bridging internal and external knowledge in LLMs, mitigating vocabulary mismatch, and enhancing interpretability. Practically, MemGraph offers a scalable solution for IP management systems, reducing reliance on extensive domain-specific fine-tuning and improving robustness to retrieval noise.

Future research may explore:

Automated construction and dynamic updating of memory graphs for evolving domains.
Extension to multi-modal patent matching (e.g., integrating diagrams and claims).
Application to other knowledge-intensive retrieval and matching tasks.

Conclusion

MemGraph provides a principled framework for enhancing LLM-based patent matching by traversing the memory graph to extract entities and ontologies, thereby improving retrieval relevance, reasoning quality, and robustness to noise. The approach yields strong empirical gains and offers a pathway toward more interpretable and effective AI systems for intellectual property management and other hierarchical semantic tasks.