Hierarchical Semantic Decomposition
- Hierarchical semantic decomposition is a process that breaks down complex concepts into hierarchically organized, semantically meaningful sub-components.
- It employs LLM-based pipelines with prompt construction, normalization, and ontology alignment to ensure precision and depth accuracy.
- Evaluation metrics like semantic F1 and hierarchy-aware F1 quantify both semantic fidelity and granularity adherence, driving methodological improvements.
Hierarchical semantic decomposition encompasses a family of methods that formalize, implement, and evaluate the breakdown of complex concepts, skills, or data entities into semantically meaningful and hierarchically organized components. These methods are critical for bridging the gap between coarse, high-level descriptions and actionable, fine-grained representations across domains such as skill evaluation, computer vision, natural language understanding, and model interpretability. This article presents a technical synthesis of hierarchical semantic decomposition with an emphasis on the formalization, methodology, evaluation, empirical results, and implications as articulated in ontology-grounded LLM skill decomposition (Luyen et al., 13 Oct 2025), followed by contextualization within the broader research landscape.
1. Formal Definition and Conceptual Foundations
Hierarchical semantic decomposition is defined by the process of systematically producing a set of sub-concepts or sub-tasks from a parent node such that:
- Each child is a strictly more specific refinement of the parent.
- The set of children is structurally aligned with a predefined hierarchy, typically represented as a directed, labeled ontology where nodes represent concepts (e.g., skills, competences), edges (e.g., hasSubSkill, skos:narrower) encode semantic relations, and is a canonical surface form.
Let be a parent (source) node. A hierarchical semantic decomposition process generates candidates such that each:
- Satisfies strict specificity with respect to
- Is non-duplicative (uniqueness constraint)
- Represents a valid node in the target ontology (type conformity)
Two operational regimes are distinguished:
- Closed World: Only descendants within a fixed radius (e.g., direct children, ) of are permissible outputs.
- Open World: Novel noun-phrase variants are allowed, subject to post-hoc alignment with the ontology.
Granularity is established via graph distance: direct children are at depth-1, grandchildren at depth-2, and so on. Accordingly, “correct” decompositions are those that identify true children at the correct depth, while deeper descendants may receive partial credit.
2. Methodology and Pipeline Construction
The end-to-end pipeline for hierarchical semantic decomposition in LLM-based skill decomposition consists of the following stages (Luyen et al., 13 Oct 2025):
- Prompt Construction: Given —where encodes parameters such as the target number and language constraints—a deterministic prompt builder specifies the required output schema and, in few-shot mode, incorporates in-context exemplars from disjoint parents, matched by depth-proximity.
- LLM Generation: The schema-constrained prompt is provided to the LLM, which decodes precisely candidate sub-skills.
- Decoding-Time Checks: Lightweight filters are applied, including noun-phrase validation, parent repetition suppression, and type checks. Output lists failing these heuristics are post-processed.
- Normalization and Deduplication: Surface forms are lowercased, whitespace-normalized, extraneous punctuation and hyphen artifacts are removed, and Sentence-BERT embeddings are clustered to eliminate paraphrastic duplicates.
- Ontology Alignment (-mapping): Each candidate is matched to a node in the descendant cone by maximizing cosine similarity between E() and E(), with a confidence threshold . Unverifiable outputs are assigned .
Zero-shot (ZS) and leakage-safe few-shot (FS) decomposition differ exclusively in context: ZS uses only the instruction, while FS conditions on label-disjoint, granularity-matched exemplars.
3. Evaluation Metrics for Semantic and Hierarchy Fidelity
Two F-style metrics are used to quantify decomposition quality:
a. Semantic F Score
For a parent with model predictions and gold children :
- Cosine similarity matrix is constructed.
- The optimum bipartite matching (Hungarian algorithm) maximizes total similarity .
- Precision, recall, and F per parent are:
- Macro-average is reported as .
b. Hierarchy-Aware F Score
Captures granularity fidelity using node placement:
- Each valid match is credited:
- $1.0$ if (direct child)
- $0.5$ if is a deeper descendant
- $0$ otherwise
- Matrix is constructed; matching and F-calculations parallel the semantic case.
- Macro-averaged as .
These metrics together differentiate between content-accurate but mis-granular outputs versus those that are both semantically and structurally compliant.
4. Empirical Results and Benchmark Analysis
Evaluation is conducted on ROME-ESCO-DecompSkill, a curated subset of the ROME 4.0 and ESCO ontologies (French, “Skills and Competences” pillar). The benchmark restricts attention to 288 parents with 5–12 children, ensuring actionable decompositions at non-trivial granularity.
Seven LLMs are compared under both ZS and FS conditions. Key observations (all F/Hier-F macro-averaged):
- Zero-shot achieves –$0.49$, demonstrating that large pretrained LLMs encode robust prior knowledge of skill decomposition.
- Hierarchy-aware F is much lower for ZS (–$0.12$). Few-shot consistently improves Hier-F (e.g., DeepSeek V3 , GPT-5 ), and often boosts F itself (K2 Instruct ). Best observed: Llama4 Scout FS (, Hier-F).
- Latency is model- and prompt-dependent: for some models, the use of FS prompts actually reduces wall time due to more schema-conformant output and earlier completion, offsetting the cost of longer prompts. Wall times range from 4.7s to 169s per instance; the trade-off is empirically nontrivial.
A representative decomposition example for the parent “Data analysis” demonstrates that FS prompts better suppress spurious or over-general outputs, accurately identifying depth-1 children and avoiding unverifiable instances.
5. Applicability, Implications, and Limitations
Treating the ontology strictly as an evaluation scaffold (not as retrieval or generation corpus) establishes a reproducible, leakage-controlled protocol for testing hierarchical semantic decomposition systems. The joint use of semantic and hierarchy-aware metrics enables rigorous diagnosis of two error types: content drift and granularity drift, which pure generative or ontology-constrained strategies alone cannot disentangle.
Few-shot prompting with depth-matched, label-disjoint exemplars (selected via graph heuristics) serves as an effective structural prior. This suppresses over-generalization, reduces output phrasing variance, and improves both alignment and depth placement—effects especially pronounced in mid-scale LLMs.
Limitations include:
- Sensitivity of very large LLMs to exemplar selection, which can induce precision-recall trade-offs if exemplar depth diverges from the target.
- The reliance on cosine similarity thresholds in open-world evaluation, which can misalign strong paraphrases.
- The focus on a single language and ontology, raising questions of robustness across multilingual and multi-taxonomy settings.
Proposed future paths include retrieval-augmented generation with masked evidence, dynamic exemplar selection conditioned on graph properties, graph-constrained decoding for subtree-enforcement, and adaptation for multilingual or heterogeneous taxonomies.
6. Broader Context and Related Methodologies
Hierarchical semantic decomposition, as formalized here, shares conceptual similarities with:
- Recursive and hierarchical decomposition for 3D shape segmentation and unsupervised part discovery (Yu et al., 2019, Paschalidou et al., 2020), where hierarchies encode part-of relations and are enforced via recursive neural networks or binary trees.
- Hierarchical semantic segmentation in images (Li et al., 2022) via multi-label output heads and hierarchy consistency losses.
- Hierarchical post-hoc explanation in NLP (Jin et al., 2019), where non-additive, context-independent importances are attributed to compositional spans, with algorithms ensuring that attribution reflects true compositional structure rather than additive token effects.
- Hierarchical task decomposition in robotics (Liu et al., 5 Jun 2025), aligning LLM-generated task trees with control primitives and spatial semantic maps.
- Structured semantic priors in computer vision (Saini et al., 2019), which regularize local to global inferences using contextual hierarchies.
Across these domains, the unifying principle is to enforce and exploit explicit semantic structure across levels—either as inductive bias in training, generative prior in output, post-hoc explanation scaffold, or evaluation metric—bridging gaps between unstructured generations and human-understandable, actionable representations.
7. Summary Table: Core Components of Ontology-Grounded Hierarchical Semantic Decomposition
| Component | Description | Canonical Instance |
|---|---|---|
| Ontology | Directed graph of concepts and semantic relations | ESCO, ROME |
| Decomposition Method | LLM-generative (ZS/FS) prompt-based generation | Few-shot with depth-matched exemplars |
| Output Normalization | Syntactic cleaning and semantic clustering via embeddings | Sentence-BERT |
| Alignment | Cosine-matching to ontology descendants with threshold | |
| Metrics | Semantic F, hierarchy-aware F (Hungarian assignment) | See Section 3 |
| Evaluation Data | Benchmark with 5–12 child nodes per parent, varied depth | ROME-ESCO-DecompSkill |
| Latency Range | Model- and prompt-dependent: 4.7s–169s (per instance) | Llama4 Scout, DeepSeek, etc. |
Hierarchical semantic decomposition, as embodied in contemporary LLM and ontology research (Luyen et al., 13 Oct 2025), is a critical methodological advance for robust, interpretable, and actionable decomposition of complex concepts, bridging the granularity gap between high-level abstraction and low-level operationalization across cognitive and computational disciplines.
Sponsored by Paperpile, the PDF & BibTeX manager trusted by top AI labs.
Get 30 days free