Hierarchical Knowledge Graphs

Updated 18 December 2025

Hierarchical Knowledge Graphs are structured representations that organize entities into multi-level graphs reflecting taxonomic, containment, and progression relationships.
They employ methodologies like unsupervised clustering, path-based tag clustering, and Bayesian hierarchical blockmodelling to induce robust hierarchies.
They enable improved reasoning and link prediction through tailored embedding methods such as hyperbolic and polar embeddings, enhancing performance in multi-modal applications.

A hierarchical knowledge graph (HKG) is a structured data representation in which entities, events, or documents are organized into a multi-level, tree-like or directed acyclic graph (DAG) structure, where higher levels correspond to more abstract or general concepts and lower levels encode finer, more specific classes or relations. Hierarchical organization captures semantic containment, taxonomic relationships, part-whole decompositions, or progression across scale, supporting both efficient representation of real-world hierarchies and principled reasoning over parent–child dependencies.

1. Theoretical Foundation and Formalism

The formal basis of HKGs is the augmentation of a standard knowledge graph—typically defined as a set of triples $(h, r, t)$ where entities and relations are drawn from a finite vocabulary—with an explicit hierarchical structure. Hierarchies may manifest as:

Taxonomies (e.g., “mammal → primate → human” in WordNet/Freebase/YAGO) with strict parent–child ordering.
Containment trees or DAGs over entity groups, where each group represents a set of entities defined by predicates or co-membership relations (Mohamed, 2019).
Blockmodels and probabilistic representations where entities are assigned to community paths in a tree, as in nested Chinese Restaurant Process-based models (Pietrasik et al., 2024).
Layered, multi-level graphs spanning atomic facts, fact nodes, and hyper-relational/nested constructs (Liu et al., 2024).

A key mathematical property in categorical hierarchy theory is path-equality: all paths between two objects in the skeleton must correspond to the same morphism in the instantiating category, providing a commutative foundation for update propagation and for representing knowledge schemas (Harmer et al., 2020).

2. Hierarchy Construction Methodologies

Multiple algorithmic strategies have been developed for inducing, learning, or extending hierarchical structure over knowledge graphs:

2.1 Unsupervised Containment and Grouping

Turn predicate–object pairs (e.g., “LivesIn, Europe”) into candidate entity groups, define containment via hub-promoted index ( $S^{HPI}(g_i, g_j)$ ), and induce parent–child edges when containment exceeds a threshold ( $\theta$ ). Hierarchies are constructed greedily, enabling tolerance to noise and missing data (Mohamed, 2019).

2.2 Path-Based Tag Clustering

Transform each KG triple into subject–tag pairs (e.g., $s, t = (r, o)$ ), then rank tags by generality ( $G_t$ ), recursively build a tag tree with directed path-based similarities ( $S_{t_a \to t_b}$ ), and assign entities to clusters by maximizing Jaccard-like belonging scores. This yields interpretable, fine-grained trees matching human-crafted taxonomies (Pietrasik et al., 2021).

2.3 Hierarchical Blockmodelling

Apply nonparametric Bayesian models (nested CRP for cluster paths, stick-breaking for level mixing) to assign each entity a path through a tree of latent communities, with edge generation modeled via Beta-Bernoulli distributed block interactions. Hierarchy structure, level assignment, and inter-block probabilities are jointly inferred via collapsed Gibbs sampling (Pietrasik et al., 2024).

2.4 LLM–Driven Augmentation

Leverage LLMs (e.g., GPT-4) with few-shot prompting and hierarchical generation to induce or densify taxonomies in existing KGs. Classification and parent–child edge generation modules, supported by expert-defined top-level categories, can yield comprehensive multi-level hierarchies with high coverage and user-rated relevance (Sharma et al., 2024).

2.5 Matrix Factorization of Textual Corpora

Perform multi-modal hierarchical nonnegative matrix factorization over TF–IDF, word–co-occurrence, and category matrices, recursively decomposing document corpora into topic trees. Hierarchical relations among topic, document, and entity nodes form the backbone of the resulting multimodal KG (Barron et al., 2024).

3. Hierarchical Embedding and Representation Learning

Encoding hierarchical structure in embeddings is critical for downstream reasoning and link prediction:

Polar embeddings (HAKE): Map entities to polar coordinates, with radial (modulus) representing depth and angular (phase) differentiating siblings. Scoring combines both components; ablation studies confirm modulus captures vertical hierarchy and phase discriminates siblings (Zhang et al., 2019).
Hyperbolic embeddings (Poincaré): Exploit the negative curvature of hyperbolic space to capture exponential branching with low distortion. HypHKGE introduces attention-based learnable curvature for each entity–relation pair ( $c_{h,r}$ ) and orthogonal relation transformation layers for both inter-level (scaling radial depth) and intra-level (Givens rotation for siblings) movement, significantly improving low-dimensional link-prediction (Zheng et al., 2022).
Unified hierarchical representations (UniHR): Abstract facts spanning hyper-relational, temporal, or nested structures into a three-layer data graph (atomic nodes, relation nodes, fact nodes), processed by a hierarchy-aware GNN employing both intra- and inter-fact message passing for robust structure learning across heterogeneous KGs (Liu et al., 2024).
Neural autoencoding: Stack energy-based models (DBNs) whose hidden layers mirror the alternation of property-based splits, letting latent codes internalize an entity’s root-to-leaf path in the conceptual hierarchy (Murphy, 2019).

4. Hierarchical Knowledge Graphs in Real-World Applications

HKGs underpin a broad array of practical systems:

Exploratory search: Multi-layer HKGs combine flat KG connectivity with higher-level central-concept views to support faceted navigation and detail-on-demand; empirical studies demonstrate parity with flat KGs on sensemaking performance, but better interpretability for open-ended “Learn” tasks (Sarrafzadeh et al., 2020).
Narrative/visual media understanding: HKGs structured into panel-, event-, and story-level graphs, augmented by semantic normalization via embedding-based clustering, reduce redundancy and support robust cross-granular reasoning tasks in multimodal corpora (e.g., Manga109) (Chen, 20 Aug 2025).
Academic survey and recommendation: Tree-structured paper KGs built on top of citation (inheritance) and semantic “issue finding”–“issue resolved” edges allow for both research lineage tracing and problem–solution mapping; SVM/SciBERT classifiers label key sentences, supporting semi-automated construction (Li et al., 2024).
Biomedical event extraction: Sentence-specific subgraphs spanning token, concept, and semantic-type tiers are distilled from UMLS. Edge-conditioned attention GNNs propagate context from high-level types down to tokens, improving extraction of complex events (Huang et al., 2020).
Schema management: Hierarchical path-equality enforced via sesqui-pushout (SqPO) rewriting yields strict propagation of updates through both data and schema layers, with formal categorical and implementation guarantees (Harmer et al., 2020).

5. Empirical Insights and Performance Benchmarks

Hierarchical models consistently outperform flat or non-hierarchical baselines on tasks requiring semantic generalization or hierarchical reasoning:

Embeddings: HypHKGE achieves +14.2% relative MRR gain over best Euclidean baselines at low dimension (d=32) on YAGO3-10, with ablation confirming the necessity of inter-level, intra-level, and adaptive curvature components (Zheng et al., 2022). HAKE surpasses RotatE on WN18RR/FB15K-237/YAGO3-10 with MRR improvements; modulus-only or phase-only ablations confirm divisibility of hierarchy and sibling discrimination (Zhang et al., 2019).
Probabilistic models: Hierarchical blockmodel ARI rises monotonically from ~0.3 (top level) to ~0.83 (leaf) in synthetic trees; real-subsets of FB15k-237 and WikiData show ARI/NMI on par or better than embedding + clustering (Pietrasik et al., 2024).
Augmented hierarchies: Transformer-augmented KGs reach 98–99% hierarchy coverage in intent and color domains, with >95% placement relevance as judged by human spot-checking (Sharma et al., 2024). Cyclical and one-shot prompting yield complementary trade-offs in error propagation and context handling.

6. Technical Challenges and Future Directions

Key open issues span scalability, robustness, semantic fidelity, and interpretability:

Scalability: Nonparametric probabilistic inference (e.g., Gibbs for nCRP) is computationally intensive for large KGs; variational or Poisson-link approximations, as well as parallelization, are promising alternatives (Pietrasik et al., 2024).
Noise and incompleteness: Greedy containment thresholds ( $\theta<1$ ) and minimum group sizes ( $\alpha$ ) allow for error-tolerant induction but may miss subtle structure (Mohamed, 2019). Embedding methods may benefit from hybridizing explicit group structure with learned hierarchies.
Semantic normalization: Hierarchical KGs in narrative domains profit from lexical and embedding-based canonicalization, reducing ontology size and annotation noise (~25% reduction in Manga109) (Chen, 20 Aug 2025).
Reasoning: Formal logic, categorical rewriting (SqPO), and learned geometric/algebraic transformations are complementary approaches to enabling controlled propagation, update, and deduction across hierarchical structures (Harmer et al., 2020, Liu et al., 2024).
Unified vs. specialized models: UniHR demonstrates the feasibility of a single hierarchical encoder across hyper-relational, temporal, and nested fact types; further generalization to inductive and multi-modal KGs is an explicit research objective (Liu et al., 2024).

Continued progress will target scalable learning, richer semantic integration (text, images, temporal), dynamic hierarchy refinement, and deeper evaluation of user-guided exploration, particularly in high-noise, heterogeneous-ontology settings.