TCM Knowledge Graph: Methods & Applications

Updated 20 July 2025

TCM Knowledge Graph is a structured representation of TCM entities and their interrelations drawn from classical literature and clinical practice.
It integrates multi-source data using advanced graph construction, embedding techniques, and NLP to support precise clinical recommendations.
Applications include personalized herb prescription, patient trajectory modeling, and linking TCM concepts with modern biomedical data.

A Traditional Chinese Medicine (TCM) Knowledge Graph is a formalized, structured representation of entities (such as symptoms, syndromes, herbs, prescriptions, treatment principles) and their interrelations derived from both classical literature and clinical practice in TCM. It provides a computational substrate to encode, reason over, and retrieve the holistic and highly interconnected knowledge characteristic of traditional Chinese diagnostic and therapeutic systems. Recent research has advanced the design and application of TCM knowledge graphs to accommodate clinical recommendation, natural language understanding, rigorous evaluation, and integration with modern biomedical informatics.

1. Graph Construction and Representation

The construction of a TCM knowledge graph typically begins with extraction of entities and their relationships from diverse data sources, including annotated classical Chinese texts, clinical case records, modern research articles, and existing TCM databases. Entities commonly modeled include symptoms, syndromes, herbs, formulas (prescriptions), pathways (e.g., meridians), acupoints, and treatment principles. Relations cover aspects such as “has_symptom,” “treats,” “composed_of,” “contraindicated_with,” “synergizes_with,” “effects_on,” and more specialized links reflecting TCM theory.

Several graph construction methodologies have been developed:

Multi-graph Construction: For recommendation and representation tasks, multi-graph approaches create concurrent graphs encoding symptom–herb bipartite relationships, symptom–symptom co-occurrence (synergy), and herb–herb co-occurrence (compatibility). For example, edges are constructed as in:

$SH_{s,h} = \begin{cases} 1, & \text{if } (s,h) \text{ co-occur in a prescription} \ 0, & \text{otherwise} \end{cases}$

(Jin et al., 2020)

Entity and Relation-Focused Views: Construction of both entity-oriented graphs (nodes for entities only) and relation-focused graphs (nodes for entities and relations) enables richer embedding learning through dual perspectives; relation-focused graphs capture subtle dependencies among semantic roles (Tong et al., 2021).
Hierarchical and SPO-T Trees: Trees built from Subject–Predicate–Object–Text (SPO-T) constructs support hierarchical organization, allowing depth-specific retrieval and evidence aggregation (Liu et al., 13 Feb 2025).
Ontology-Driven and Heterogeneous Graphs: Ontology modeling (defining concepts, relations, properties) underpins standardization and extensibility, as seen in the HBot system, which structures acupoints, meridians, symptoms, therapies, and more with robust type hierarchies (Zhang et al., 1 Aug 2024).
Automated Information Extraction: For classical Chinese sources, conditional random field (CRF) models and TF-IDF-based keyword selection are combined with dependency parsing to automate extraction of entities and their grammatical/semantic relations (Zhao et al., 16 Feb 2024). Entity–relation extraction models using BERT-CRF or ERNIE+CRF architectures support large-scale, document-level knowledge graph growth.

2. Graph-Based Modeling and Embedding Learning

TCM knowledge graphs serve as the foundation for advanced machine learning models, notably graph neural networks (GNNs), to learn meaningful representations over structured TCM data:

Graph Convolution Networks (GCNs): These propagate information over multi-graph structures, learning embeddings for symptoms and herbs by aggregating features from direct neighbors and synergistic partners. High-order propagations and GraphSAGE-like message passing further enrich representations (Jin et al., 2020).
Attention and Transformer Mechanisms: Multi-relational graph transformers and attention-based modules (e.g., HABRM in FMCHS) facilitate selective and context-sensitive feature fusion across intra- and inter-entity relations, including long-range dependencies in the graph (Zheng et al., 7 Mar 2025).
Virtual Nodes and Hypergraphs: The integration of virtual nodes representing properties such as medicinal nature, flavor, or channel tropism enables hierarchical modeling and higher-order interaction patterns (e.g., hypergraph networks in quantifying herbal compatibility) (Zeng et al., 18 Nov 2024).
Quaternion and Rotational Embeddings: Quaternion GNNs, as introduced in WGE, enable richer feature transformations for both entities and relations, capturing multi-view interactions and symmetry in the graph (Tong et al., 2021).
Neural Autoencoders and Embedding Spaces: Interpretable autoencoders trained on symptom–herb mappings produce latent embedding spaces (TCM-ES), where clustering and spatial proximity reflect TCM theory’s holistic mappings and relate them to biomedical features (Li et al., 15 Jul 2025).

3. Integration of Classical, Clinical, and Molecular Data

Knowledge graph expansion increasingly involves multi-modal integration:

Classical Text Mining: Large-scale extraction from ancient and canonical TCM sources leverages LLM-driven prompt engineering for structured data output (e.g., JSON) and is followed by expert review and refinement (He et al., 28 Apr 2025, Zhao et al., 16 Feb 2024).
Clinical Case Linking: Embedding modern diagnostic data (clinical records, electronic health records) into the knowledge graph enables direct mapping from real patient symptoms to TCM ontological categories, supporting both retrospective and prospective clinical evaluation (Wei et al., 17 Nov 2024, Liu et al., 19 May 2025).
Molecular and Chemical Profiling: FMCHS demonstrates the embedding of molecular properties of herbs (e.g., SMILES strings processed by UniMol) alongside traditional symptom and property networks. Variational autoencoders are used for herbs without molecular data, ensuring dense and consistent embeddings for all graph nodes (Zheng et al., 7 Mar 2025).
Biomedical Correspondence: TCM-ES provides a bridge to modern biomedicine, mapping diseases, compounds, and targets (protein interactome networks) to their corresponding entities in the TCM graph. Bi-directional z-scores (BZS) offer quantitative measures for drug–disease associations beyond symptom co-occurrence (Li et al., 15 Jul 2025).

4. Applications: Recommendation, Reasoning, Dialogue, and Clinical Decision Support

The TCM knowledge graph enables a suite of practical and analytical applications:

Herb and Prescription Recommendation: Multi-graph and multiscale correlation frameworks support syndrome-aware and personalized herb recommendation, yielding significant improvements in Precision@K, Recall@K, and F1 metrics over prior baselines (Jin et al., 2020, Zheng et al., 7 Mar 2025).
Sequential Patient Modeling: By explicitly modeling patient trajectories over multiple consultations and treatments, graph-driven models (SCEIKG) account for temporal evolution in conditions and prescriptions, capturing patient-specific treatment paths (Liu et al., 2023).
Graph-Based Retrieval-Augmented Generation: Systems like OpenTCM use GraphRAG to retrieve high-fidelity semantic subgraphs for input to LLMs, enabling accurate answering of diagnostic queries and retrieval of detailed ingredient information directly grounded in canonical TCM texts (He et al., 28 Apr 2025).
Explanation and Traceability: Attention masks and graph-based attention mechanisms permit interpretable prescription generation, making clear the knowledge elements (e.g., nature, taste, channel tropism, effect) underpinning output predictions (Pu et al., 2023).
Conversational and Multi-turn Dialogue: Advanced architectures such as DoPI integrate the knowledge graph as the backbone of multi-turn doctor–patient dialogues. Guidance models query the graph to dynamically select the most informative questions, while expert models synthesize final diagnosis and treatment plans, achieving diagnostic accuracy beyond 80% (Sun et al., 7 Jul 2025).
Educational Visualization: Coupling the knowledge graph with 3D anatomical models (as in HBot) supports intuitive exploration of acupoints, meridians, and their therapeutic relations in educational and healthcare settings (Zhang et al., 1 Aug 2024).

5. Benchmarking, Evaluation, and Quality Control

Large-scale, domain-specific benchmarks and evaluation frameworks are essential for assessing and refining TCM knowledge graph-driven systems:

Standardized Exam Datasets: Benchmarks such as TCM-Bench (TCM-ED), TCMD, and TCM-3CEval employ thousands of exam-like questions spanning TCM theory, classical text understanding, and clinical reasoning. These serve both as evaluation tools for LLMs and as quality assurance sources for knowledge graph construction (Yue et al., 3 Jun 2024, Yu et al., 7 Jun 2024, Huang et al., 10 Mar 2025).
Domain-specific Metrics: The development of TCMScore—which integrates Term F1 (via standardized concept matching) and semantic inference—enables objective, expert-level assessment of LLM and graph-based outputs, highlighting the importance of preserving both semantic and domain-specific fidelity (Yue et al., 3 Jun 2024).
Robustness Analysis: Methods such as answer option shuffling and ensemble voting are used to reveal model inconsistencies and inform strategies for graph update and maintenance (Yu et al., 7 Jun 2024).
Multi-dimensional Clinical Evaluation: Human expert evaluation protocols measure not only factual accuracy and safety but also explainability, consistency, compliance, and self-consistency in real-world diagnostic and treatment tasks (Liu et al., 13 Feb 2025, Sun et al., 7 Jul 2025, Liu et al., 19 May 2025).

6. Future Directions and Challenges

Current and future research in TCM knowledge graphs highlights several opportunities and obstacles:

Integration with Modern Biomedicine: The alignment of TCM entity embeddings with genomic, proteomic, and pathway data paves the way for novel insights into TCM’s mechanistic validity, drug repurposing, and multi-omics medicine (Li et al., 15 Jul 2025, Zeng et al., 18 Nov 2024).
Enhancement through LLMs: Dedicated TCM LLMs (e.g., BianCang, Tianyi) pre-trained and fine-tuned on data derived from TCM knowledge graphs, clinical records, and classical corpora are being deployed both as QA systems and as engines for dynamic knowledge graph construction and maintenance (Wei et al., 17 Nov 2024, Liu et al., 19 May 2025).
Personalized and Dynamic Modeling: Embedding additional patient-specific attributes (age, gender, comorbidities, BMI) and temporal modeling (disease trajectories rather than static entities) are priorities for enabling individualized treatment strategies (Li et al., 15 Jul 2025).
Data Quality and Maintenance: Ensuring high information reliability requires continued human oversight, robust entity-linking, conflict resolution in data fusion, and handling of classical Chinese unique linguistic challenges (He et al., 28 Apr 2025, Zhao et al., 16 Feb 2024).
Explainability and Ethical Considerations: The drive for interpretability—whether via attention visualization, embedding proximities, or trace-back to canonical sources—remains central to the responsible and clinically trusted deployment of TCM knowledge graph systems.

A TCM Knowledge Graph, as defined by current research, serves as a critical bridge between empirical holistic wisdom and computationally tractable, explainable AI. By supporting advanced modeling, retrieval, reasoning, and clinical applications, it is positioned to enhance TCM’s scientific maturation and integration with modern evidence-based medicine.