Personal Knowledge Graphs
- Personal Knowledge Graphs are user-owned, formalized graphs that enable precise access control, provenance tracking, and continuous evolution of personal data.
- They integrate structured and unstructured data via entity extraction, semantic alignment, and ontology linking to support personalized services.
- PKGs fuel innovations in healthcare, research, and education while addressing challenges in scalability, privacy, and dynamic knowledge adaptation.
Personal Knowledge Graphs (PKGs) are formalized, user-owned graph structures representing facts of personal relevance, with precise control over access, provenance, and ongoing evolution. They serve as foundational substrates for a diverse range of applications, from personalized healthcare and research support to scalable recommender systems, privacy-preserving assistants, and adaptive educational experiences.
1. Formal Definitions and Core Structural Principles
PKGs generalize the knowledge graph (KG) paradigm by emphasizing individual data ownership, fine-grained access control, and the centrality of personal or user-centric entities and relations. A canonical definition is (Skjæveland et al., 2023):
A personal knowledge graph (PKG) is a KG where a single individual, called the owner, has (1) full read and write access to the KG, and (2) the exclusive right to grant others read and write access to any specified part. The primary purpose of the PKG is to support the delivery of services customized particularly to its owner.
Structurally, PKGs are modeled as directed, labeled multigraphs or RDF graphs. Typical formalizations include:
- : is the set of (user-relevant) nodes with type and property annotations, is the set of labeled, directed edges (relations), and encodes per-node/edge attributes (Bloor et al., 2023, Kejriwal, 2023, Chakraborty et al., 2022).
- PKG data models frequently blend a core schema (e.g., FOAF, schema.org) with extensible domain ontologies, property graphs (e.g., Neo4j), or RDF triple stores for semantic interoperability (Bernard et al., 2024, Chakraborty et al., 2022).
Within the health domain, specialization yields the “Personal Health Knowledge Graph” (PHKG), in which personal entities, attributes, and relations are defined over individual health states, clinical measurements, and contextually linked to standard ontologies (SNOMED CT, ICD9, FHIR) (Bloor et al., 2023, Shirai et al., 2021, Seneviratne et al., 2021).
Similarly, in research, the “Personal Research Knowledge Graph” (PRKG) restricts the subgraph to research-relevant entities, activities, and assets (publications, datasets, affiliations, tools) (Chakraborty et al., 2022). Educational PKGs, as in MOOCs, are learner-centered subgraphs of Educational Knowledge Graphs capturing explicit concept-level knowledge gaps (Abdelmagied et al., 15 May 2025).
2. Data Sources, Ontologies, and Construction Pipelines
PKG construction blends structured and unstructured data ingestion, entity/relation extraction, semantic alignment, and ongoing synchronization:
- Data Sources:
- Healthcare: EHRs, clinical notes, wearable/sensor streams, and standard medical ontologies (Bloor et al., 2023, Shirai et al., 2021).
- Research: CVs, publication repositories, lab inventories, emails, chat logs (Chakraborty et al., 2022).
- E-learning: clickstreams, search queries, document views, self-reports (Ilkou, 2022, Abdelmagied et al., 15 May 2025).
- General: social media, user preferences, natural language input (Bernard et al., 2024, Skjæveland et al., 2023).
- Processing and Annotation:
- Preprocessing: normalization, NER via domain- or SciERC-tuned transformers, time-series summarization for health (Bloor et al., 2023, Chakraborty et al., 2022).
- Entity linking: via public KGs (Wikidata, DBpedia), applying contextual or embedding-based similarity (Bernard et al., 2024, Chakraborty et al., 2022, Shirai et al., 2021).
- Triple extraction: joint NER and RE, pattern-based extraction, incremental updates (Chakraborty et al., 2022, Shirai et al., 2021).
- Semantic Integration:
- Outbound linking to public/domain ontologies (e.g., SNOMED CT for diagnoses, DBpedia/Wikidata for general concepts, custom research or learning ontologies).
- Alignment of personal predicates to standard schemas (e.g., mapping “likes”/“uses” to RDF properties) (Bernard et al., 2024, Chakraborty et al., 2022).
- Data Model:
- RDF property graphs, with explicit provenance (e.g., pav:createdOn), access rights (pkg:readAccessRights/writeAccessRights), and weighted preferences (wi:weight) (Bernard et al., 2024).
- Per-triple or per-node privacy controls (C_priv) and timestamps for temporal reasoning (Chakraborty et al., 2022, Ilkou, 2022).
PKG maintenance requires mechanisms for incremental ingestion, versioning, conflict detection, provenance auditing, and compliance with privacy regimes (HGPR/GDPR) (Shirai et al., 2021, Skjæveland et al., 2023, Ilkou, 2022).
3. Inference, Summarization, and PKG Adaptation
PKGs are subject to diverse inference and summarization operations to support personalized services, compact storage, and knowledge discovery:
- Rule-based and Model-based Inference:
- Rule execution engines (e.g., APOC triggers in Neo4j) issue clinical alerts or drive personalized recommendations using subclass inference, threshold-based rules, and ontology-aligned heuristics (Bloor et al., 2023, Seneviratne et al., 2021).
- Graph-based and statistical models (GNNs, embedding models) learn latent representations for link prediction, risk scoring, and query completion (Su et al., 2023, Chakraborty et al., 2022, Bloor et al., 2023).
- Summarization and Adaptation:
- APEX and APEX-N provide adaptive, extreme summarization of PKGs under severe storage constraints, capturing user “interest drift” via heat-diffusion models and selecting top-K utility-maximizing triples as interests shift (Li et al., 2024). The summarization framework updates relevance scores in time per update and is validated with compression on multi-million-triple KGs.
- Neuro-symbolic adaptation frameworks support the dynamic restructuring of PKGs (soft/hard reweighting, targeted triple removal) to avoid over-personalization and filter bubble formation in LLM-based recommender systems (Spadea et al., 8 Sep 2025).
- Temporal/Incremental Update Protocols:
- Validity intervals and entity joins handle conflicting facts and evolving user states, with resolution driven by confidence, recency, or explicit provenance (Chakraborty et al., 2022, Shirai et al., 2021).
4. Access Control, Provenance, and Privacy
Robust access and provenance management systems are central to PKG integrity and user trust:
- Fine-Grained Rights:
- Every assertion/triple is paired with explicit read/write rights (pkg:readAccessRights, pkg:writeAccessRights) denoting allowed agents/services (Bernard et al., 2024).
- Role-based and attribute-level access control is enforced within property-graph (e.g., Neo4j) or triplestore (RDF/WAC) environments (Chakraborty et al., 2022, Skjæveland et al., 2023).
- Provenance Tracking:
- Statements are annotated with creator (pav:createdBy), timestamp (pav:createdOn), and source linkage to support audit trails and update propagation (Bernard et al., 2024, Shirai et al., 2021).
- Privacy and Governance:
- User-specified constraints govern retention, sharing, and data deletion (C_priv), with adaptive synchronization strategies to balance real-time updates and exposure risk (Ilkou, 2022, Shirai et al., 2021, Skjæveland et al., 2023).
These constraints also affect synchronization with upstream/downstream data sources, requiring bidirectional update propagation and conflict resolution policies, especially in sensitive domains (healthcare, education) (Shirai et al., 2021, Bloor et al., 2023).
5. Applications and Evaluation Methodologies
PKGs underpin a wide spectrum of applications, each evaluated via domain-specific and graph-theoretic metrics:
| Domain | Application Paradigm | Core Evaluation Metrics |
|---|---|---|
| Health | PHKG for monitoring/alerting | Recall, sensitivity, specificity, query time (Bloor et al., 2023) |
| Research | PRKG for assistant/recommendation | Extraction/linking F₁, MRR, user satisfaction (Chakraborty et al., 2022) |
| Recommender | Personalized, domain-aligned PKG | HR@10, NDCG@10, online clickthrough-rate (Su et al., 2023) |
| E-learning | Learner-centric PKG, QG/RAG | Human relevance scores, explainability, user studies (Ilkou, 2022, Abdelmagied et al., 15 May 2025) |
| Knowledge Management | PKG API for statement mediation | Precision (NL2KG), latency, trust metrics (Bernard et al., 2024) |
- Health: COPD patient monitoring via PHKG improves query recall by approximately 12% and achieves alerting sensitivity/specificity of 85%/78% (Bloor et al., 2023).
- Recommender: MeKB-Rec’s PKG yields up to 105% improvement in HR@10 for zero-shot CDR users (Su et al., 2023).
- Summarization: APEX and APEX-N achieve real-time updating on KGs up to 12M triples under extreme compression (Li et al., 2024).
- E-learning: PKG-driven question generation obtains mean fluency/relevance scores above 2.8/3.0 in expert evaluations (Abdelmagied et al., 15 May 2025).
PKG trust and utility are also assessed via coverage, link precision, consistency, response time, and user-centric satisfaction metrics (Skjæveland et al., 2023, Bernard et al., 2024).
6. Open Challenges and Future Research Directions
Active research addresses foundational and applied issues in PKG science:
- Standardization and Interoperability: Lack of shared vocabularies for PKG metadata (provenance, confidence, temporal context) and the absence of “PKG-ready” APIs limit cross-ecosystem compatibility (Skjæveland et al., 2023).
- Scalability and Summarization: Real-time summarization under shifting interests and, in particular, extreme space constraints remain areas of algorithmic innovation (Li et al., 2024).
- Entity Resolution and Semantic Drift: Schema-free, high-fidelity entity resolution across heterogeneous PKGs is challenged by ontology alignment, attribute sparsity, and open-world growth (Kejriwal, 2023).
- Access Control Granularity and Governance: Dynamic, predicate-level access and secure on-device PKG management require tool support for natural language programming of policies and verifiable enforcement (Bernard et al., 2024, Skjæveland et al., 2023).
- Utilization Robustness and Filter Avoidance: Maintaining recommendation diversity without sacrificing personalization in LLM-based systems motivates structure-aware PKG adaptation (Spadea et al., 8 Sep 2025).
- Explainability and Usability: Human-in-the-loop editors, provenance visualization, and user feedback mechanisms are required for trustworthy PKG curation (Shirai et al., 2021, Chakraborty et al., 2022).
Ongoing work targets economic, regulatory, and usability factors as critical enablers for PKG ecosystem adoption, complementing advances in extraction, summarization, and privacy-preserving reasoning (Skjæveland et al., 2023).