Papers
Topics
Authors
Recent
Search
2000 character limit reached

Personal Health Knowledge Graphs

Updated 20 February 2026
  • PHKGs are patient-centric, formally structured graphs that integrate heterogeneous biomedical, behavioral, and social data into a unified, machine-interpretable framework.
  • They employ rigorous ontology design, data extraction, normalization, and graph embedding techniques to support clinical decision support and predictive modeling.
  • PHKGs facilitate personalized analytics and explainable AI insights while addressing challenges in data privacy, scalability, and heterogeneous integration.

A Personal Health Knowledge Graph (PHKG) is a formally structured, semantically annotated, patient- or person-centric representation that integrates heterogeneous health-related data into a unified, queryable, and machine-interpretable graph. PHKGs encode individualized biomedical, behavioral, social, environmental, and historical facets of a single patient’s health profile, enabling comprehensive downstream analysis, reasoning, and predictive modeling for precision medicine and digital health applications.

1. Core Definitions and Formal Structures

PHKGs are typically instantiated as multi-relational directed graphs G=(V,E,τ)G = (V, E, \tau), where:

  • VV is a set of entities—patient, diagnosis, symptom, medication, procedure, lab result, genetic variant, lifestyle factor, social determinant, device measurement, etc.
  • EV×R×VE \subseteq V \times R \times V encodes typed relationships, RR being the relation vocabulary (e.g., “hasDiagnosis,” “prescribed,” “hasMeasurement,” “hasSocialContext”).
  • τ:VET\tau: V \cup E \rightarrow T maps nodes/edges to ontological schema types (often grounded in biomedical standards such as SNOMED-CT, RxNorm, LOINC).

Each assertion is represented as an RDF triple (h,r,t)(h, r, t) or, for temporally extended or provenance-rich assertions, as a quadruple or reified node incorporating context such as timestamps or data sources (Khatib et al., 2024, Rastogi et al., 2020). PHKGs differ from population-level KGs by being restricted to the personal health context—VV includes only patient-relevant nodes, and EE are health-specific, time-evolving relations (Shirai et al., 2021).

The schema often organizes nodes in a "star-shaped" topology, with the patient as the central node linked directly or via grouping nodes to demographic, clinical, and social facets (Theodoropoulos et al., 2023).

2. Ontology and Schema Design

PHKGs are governed by rigorous ontological frameworks to ensure semantic consistency, interoperability, and reasoning capability:

  • Ontologies such as SNOMED-CT, RxNorm, LOINC, HL7 FHIR, and custom social/behavioral schemas define valid entity types and relationships (Khatib et al., 2024, Shirai et al., 2021).
  • Schema alignment, mapping, and normalization are critical: codes and mentions from data sources are mapped to standardized terms using methods such as string similarity, UMLS CUI matching, and structural alignment (Khatib et al., 2024, Bloor et al., 2023).
  • A representative ontology, such as the Health and Social Person-centric Ontology (HSPO), encodes demographic (age, gender), clinical (disease, procedure, medication, intervention), and social (employment, housing, household) classes, with edge types such as hasAge, hasDisease, and hasSocialContext (Theodoropoulos et al., 2023).
  • For diet and lifestyle, ontologies incorporate food vocabularies, social determinants, and temporal patterns annotated using standards such as OWL, PROV-O, SIO, and domain-specific semantic constraints (Seneviratne et al., 2021).

PHKG schemas are extensible to cover behavioral, genomic, and device-derived entities, enabling holistic, multimodal health modeling (Theodoropoulos et al., 2023, Khatib et al., 2024).

3. Data Integration and Knowledge Extraction

PHKGs aggregate data from diverse, multi-modal sources, requiring robust pipelining and data harmonization:

  • Structured data: EHR tables (demographics, diagnoses, labs, prescriptions), genomic assays, device feeds.
  • Semi-structured data: HL7 FHIR resources, templated notes, sensor JSON/XML streams.
  • Unstructured data: clinical narratives, radiology reports, wearable lifelogs, patient-reported outcomes (Khatib et al., 2024, Rastogi et al., 2020).

Key steps:

  1. Extraction: Named Entity Recognition (NER) and Relation Extraction (RE) for concept/edge identification. NLP pipelines annotate free-text with mappings to ontology classes (Khatib et al., 2024, Theodoropoulos et al., 2023).
  2. Transformation: Entity normalization to canonical concepts (e.g., grouping ICD codes at the family level), value normalization, and de-identification (Theodoropoulos et al., 2023).
  3. Loading: Insertion as RDF triples or property-annotated nodes into a graph database (Neo4j, Blazegraph, RDF store). Each patient record becomes a subgraph with central and facet nodes, potentially omitting edges for missing data (Theodoropoulos et al., 2023, Bloor et al., 2023).
  4. Personalization: Filtering to patient-specific subgraphs, periodic updating as new observations are made, and maintaining provenance (Rastogi et al., 2020, Shirai et al., 2021).

Integration with public or global biomedical KGs is realized via entity linking (embedding-based, LLM-assisted, e.g., SAPBERT, GPT-4) (Xie et al., 26 Jul 2025), and use of external nodes in personal subgraphs (via owl:sameAs or custom edges) (Rastogi et al., 2020, Jiang et al., 2023).

4. Embedding, Inference, and Predictive Modeling

Learned PHKG representations support downstream tasks by leveraging graph-based embedding models and reasoning engines:

5. Practical Applications and Impact

PHKGs enable a wide range of personalized, context-aware, and explainable digital health functionalities:

  • Clinical Decision Support: Patient-specific subgraphs support risk prediction, treatment recommendation, and alerting (e.g., COPD monitoring using ontologized alert rules) (Bloor et al., 2023).
  • Personalization: Integration of behavioral, dietary, and social determinants allows for tailored recommendations, such as meal planning for diabetes accounting for preferences and glycemic impact (Seneviratne et al., 2021, Rastogi et al., 2020).
  • Population Health and Clinical Trials: Cohort selection and protocol matching via graph similarity and ontological expansion (Khatib et al., 2024).
  • Patient engagement and mHealth: Decentralized PHKGs (e.g., Solid PODs) empower patients to control, query, and share their health data and context (Ammar et al., 2021).
  • Research and Analytics: Cohort clustering, longitudinal modeling (temporal PHKGs), and outcome stratification (Khatib et al., 2024).

Quantitative studies demonstrate that PHKG-augmented models outperform tabular baselines, especially in sparse-data or limited-sample regimes (Theodoropoulos et al., 2023, Jiang et al., 2023, Xie et al., 26 Jul 2025).

6. Methodological Challenges and Open Problems

The construction and maintenance of PHKGs surface several fundamental challenges:

  • Data Privacy and Security: PHKGs encapsulate sensitive PHI and must incorporate pseudonymization, fine-grained access control, and privacy-preserving computation (differential privacy, federated learning, distributed graphs) (Khatib et al., 2024, Ammar et al., 2021, Shirai et al., 2021).
  • Scalability and Maintenance: Per-patient graphs avoid the scale of global KGs but require robust update strategies, versioning, incremental integration, and coping with high-velocity device data (Theodoropoulos et al., 2023, Khatib et al., 2024).
  • Heterogeneous Data Integration: Alignment across modalities (EHR, genomics, wearables, PROs), devices, and evolving schemas remains nontrivial; ontology drift and entity disambiguation are active areas (Shirai et al., 2021).
  • Temporal and Longitudinal Modeling: Emerging needs include temporal edge tracking, dynamic node state management, and causal inference for outcome simulation (Khatib et al., 2024).
  • Explainability and Trust: Maintaining provenance and interpretability, especially in ML-guided recommendations, is critical for clinical adoption (Rastogi et al., 2020, Zhao et al., 9 Dec 2025).

Persistent open questions include balancing on-device versus cloud deployment, validation of subgraph fidelity, and strategies for summarization or pruning without loss of critical context (Rastogi et al., 2020).

7. Future Directions

Current trends and proposed advancements for PHKGs include:

  • Integration with richer multi-omics, behavioral, and sensor modalities to enable more comprehensive, real-time patient modeling (Khatib et al., 2024, Bloor et al., 2023).
  • Interoperable service architectures: PHKGs exposed over RDF/SPARQL interfaces with flexible export for ML frameworks (e.g., PyG) (Theodoropoulos et al., 2023).
  • Federated and privacy-preserving infrastructures: Local graph instantiation, global knowledge transfer, and edge-level control (Ammar et al., 2021, Zhao et al., 9 Dec 2025).
  • Explainable AI: Incorporation of LLMs for traceable, clinically salient explanations tied to specific subgraph paths and graph-attention mechanisms (Zhao et al., 9 Dec 2025, Jiang et al., 2023).
  • Clinical deployment and evaluation in multi-institutional environments: Validation of scalability, robustness to missing data, and reproducibility across settings (Theodoropoulos et al., 2023).

PHKG research is poised to drive advances in personalized, data-driven healthcare, unified patient modeling, and transparent, semantically grounded decision support by leveraging ontological rigor, advanced graph learning, and integrative data fusion (Khatib et al., 2024, Theodoropoulos et al., 2023, Jiang et al., 2023, Xie et al., 26 Jul 2025).

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Personal Health Knowledge Graphs (PHKGs).