Papers
Topics
Authors
Recent
2000 character limit reached

Hierarchical Semantic Decomposition

Updated 16 November 2025
  • Hierarchical semantic decomposition is a process that breaks down complex concepts into hierarchically organized, semantically meaningful sub-components.
  • It employs LLM-based pipelines with prompt construction, normalization, and ontology alignment to ensure precision and depth accuracy.
  • Evaluation metrics like semantic F1 and hierarchy-aware F1 quantify both semantic fidelity and granularity adherence, driving methodological improvements.

Hierarchical semantic decomposition encompasses a family of methods that formalize, implement, and evaluate the breakdown of complex concepts, skills, or data entities into semantically meaningful and hierarchically organized components. These methods are critical for bridging the gap between coarse, high-level descriptions and actionable, fine-grained representations across domains such as skill evaluation, computer vision, natural language understanding, and model interpretability. This article presents a technical synthesis of hierarchical semantic decomposition with an emphasis on the formalization, methodology, evaluation, empirical results, and implications as articulated in ontology-grounded LLM skill decomposition (Luyen et al., 13 Oct 2025), followed by contextualization within the broader research landscape.

1. Formal Definition and Conceptual Foundations

Hierarchical semantic decomposition is defined by the process of systematically producing a set of sub-concepts or sub-tasks from a parent node such that:

  • Each child is a strictly more specific refinement of the parent.
  • The set of children is structurally aligned with a predefined hierarchy, typically represented as a directed, labeled ontology O=(V,E,R,)\mathcal{O} = (\mathcal{V}, \mathcal{E}, \mathcal{R}, \ell) where nodes vv represent concepts (e.g., skills, competences), edges (e.g., hasSubSkill, skos:narrower) encode semantic relations, and (v)\ell(v) is a canonical surface form.

Let S0VS_0 \in \mathcal{V} be a parent (source) node. A hierarchical semantic decomposition process generates candidates s^1,...,s^kΣ\hat{s}_1, ..., \hat{s}_k \in \Sigma^* such that each:

  • Satisfies strict specificity with respect to S0S_0
  • Is non-duplicative (uniqueness constraint)
  • Represents a valid node in the target ontology (type conformity)

Two operational regimes are distinguished:

  • Closed World: Only descendants within a fixed radius dd (e.g., direct children, d=1d=1) of S0S_0 are permissible outputs.
  • Open World: Novel noun-phrase variants are allowed, subject to post-hoc alignment with the ontology.

Granularity is established via graph distance: direct children are at depth-1, grandchildren at depth-2, and so on. Accordingly, “correct” decompositions are those that identify true children at the correct depth, while deeper descendants may receive partial credit.

2. Methodology and Pipeline Construction

The end-to-end pipeline for hierarchical semantic decomposition in LLM-based skill decomposition consists of the following stages (Luyen et al., 13 Oct 2025):

  1. Prompt Construction: Given (S0,C)(S_0, C)—where CC encodes parameters such as the target number kk and language constraints—a deterministic prompt builder specifies the required output schema and, in few-shot mode, incorporates kfs{2,3}k_{fs} \in \{2,3\} in-context exemplars Efs\mathcal{E}_{fs} from disjoint parents, matched by depth-proximity.
  2. LLM Generation: The schema-constrained prompt is provided to the LLM, which decodes precisely kk candidate sub-skills.
  3. Decoding-Time Checks: Lightweight filters are applied, including noun-phrase validation, parent repetition suppression, and type checks. Output lists failing these heuristics are post-processed.
  4. Normalization and Deduplication: Surface forms are lowercased, whitespace-normalized, extraneous punctuation and hyphen artifacts are removed, and Sentence-BERT embeddings are clustered to eliminate paraphrastic duplicates.
  5. Ontology Alignment (α\alpha-mapping): Each candidate s^\hat{s} is matched to a node vv in the descendant cone Desc(S0)V\mathrm{Desc}(S_0) \subset \mathcal{V} by maximizing cosine similarity between E(s^\hat{s}) and E((v)\ell(v)), with a confidence threshold τ=0.78\tau=0.78. Unverifiable outputs are assigned α(s^)=\alpha(\hat{s}) = \bot.

Zero-shot (ZS) and leakage-safe few-shot (FS) decomposition differ exclusively in context: ZS uses only the instruction, while FS conditions on label-disjoint, granularity-matched exemplars.

3. Evaluation Metrics for Semantic and Hierarchy Fidelity

Two F1_1-style metrics are used to quantify decomposition quality:

a. Semantic F1_1 Score

For a parent uu with model predictions s^1,...,s^p\hat{s}_1, ..., \hat{s}_p and gold children g1,...,gqg_1, ..., g_q:

  • Cosine similarity matrix Sij=cos(E(s^i),E((gj)))S_{ij} = \cos(E(\hat{s}_i), E(\ell(g_j))) is constructed.
  • The optimum bipartite matching MuM_u (Hungarian algorithm) maximizes total similarity Σu\Sigma_u.
  • Precision, recall, and F1_1 per parent are:

Psem(u)=ΣupRsem(u)=ΣuqF1sem(u)=2Psem(u)Rsem(u)Psem(u)+Rsem(u)+ϵP^{(u)}_{\mathrm{sem}} = \frac{\Sigma_u}{p} \qquad R^{(u)}_{\mathrm{sem}} = \frac{\Sigma_u}{q} \qquad F1^{(u)}_{\mathrm{sem}} = \frac{2P^{(u)}_{\mathrm{sem}} R^{(u)}_{\mathrm{sem}}}{P^{(u)}_{\mathrm{sem}} + R^{(u)}_{\mathrm{sem}} + \epsilon}

  • Macro-average is reported as F1sem\overline{F1}_{\mathrm{sem}}.

b. Hierarchy-Aware F1_1 Score

Captures granularity fidelity using node placement:

  • Each valid match (s^i,gj)(\hat{s}_i, g_j) is credited:
    • $1.0$ if α(s^i)=gj\alpha(\hat{s}_i) = g_j (direct child)
    • $0.5$ if α(s^i)\alpha(\hat{s}_i) is a deeper descendant
    • $0$ otherwise
  • Matrix Hij=Sijcredit(s^i,gj)H_{ij} = S_{ij} \cdot \mathrm{credit}(\hat{s}_i, g_j) is constructed; matching and F1_1-calculations parallel the semantic case.
  • Macro-averaged as F1hier\overline{F1}_{\mathrm{hier}}.

These metrics together differentiate between content-accurate but mis-granular outputs versus those that are both semantically and structurally compliant.

4. Empirical Results and Benchmark Analysis

Evaluation is conducted on ROME-ESCO-DecompSkill, a curated subset of the ROME 4.0 and ESCO ontologies (French, “Skills and Competences” pillar). The benchmark restricts attention to 288 parents with 5–12 children, ensuring actionable decompositions at non-trivial granularity.

Seven LLMs are compared under both ZS and FS conditions. Key observations (all F1_1/Hier-F1_1 macro-averaged):

  • Zero-shot achieves F10.39F1 \approx 0.39–$0.49$, demonstrating that large pretrained LLMs encode robust prior knowledge of skill decomposition.
  • Hierarchy-aware F1_1 is much lower for ZS (0.04\approx 0.04–$0.12$). Few-shot consistently improves Hier-F1_1 (e.g., DeepSeek V3 0.06560.08790.0656 \rightarrow 0.0879, GPT-5 0.04250.07270.0425 \rightarrow 0.0727), and often boosts F1_1 itself (K2 Instruct 0.43530.46040.4353 \rightarrow 0.4604). Best observed: Llama4 Scout FS (F1=0.4902F1=0.4902, Hier-F1=0.1399_1=0.1399).
  • Latency is model- and prompt-dependent: for some models, the use of FS prompts actually reduces wall time due to more schema-conformant output and earlier completion, offsetting the cost of longer prompts. Wall times range from 4.7s to 169s per instance; the trade-off is empirically nontrivial.

A representative decomposition example for the parent “Data analysis” demonstrates that FS prompts better suppress spurious or over-general outputs, accurately identifying depth-1 children and avoiding unverifiable instances.

5. Applicability, Implications, and Limitations

Treating the ontology strictly as an evaluation scaffold (not as retrieval or generation corpus) establishes a reproducible, leakage-controlled protocol for testing hierarchical semantic decomposition systems. The joint use of semantic and hierarchy-aware metrics enables rigorous diagnosis of two error types: content drift and granularity drift, which pure generative or ontology-constrained strategies alone cannot disentangle.

Few-shot prompting with depth-matched, label-disjoint exemplars (selected via graph heuristics) serves as an effective structural prior. This suppresses over-generalization, reduces output phrasing variance, and improves both alignment and depth placement—effects especially pronounced in mid-scale LLMs.

Limitations include:

  • Sensitivity of very large LLMs to exemplar selection, which can induce precision-recall trade-offs if exemplar depth diverges from the target.
  • The reliance on cosine similarity thresholds in open-world evaluation, which can misalign strong paraphrases.
  • The focus on a single language and ontology, raising questions of robustness across multilingual and multi-taxonomy settings.

Proposed future paths include retrieval-augmented generation with masked evidence, dynamic exemplar selection conditioned on graph properties, graph-constrained decoding for subtree-enforcement, and adaptation for multilingual or heterogeneous taxonomies.

Hierarchical semantic decomposition, as formalized here, shares conceptual similarities with:

  • Recursive and hierarchical decomposition for 3D shape segmentation and unsupervised part discovery (Yu et al., 2019, Paschalidou et al., 2020), where hierarchies encode part-of relations and are enforced via recursive neural networks or binary trees.
  • Hierarchical semantic segmentation in images (Li et al., 2022) via multi-label output heads and hierarchy consistency losses.
  • Hierarchical post-hoc explanation in NLP (Jin et al., 2019), where non-additive, context-independent importances are attributed to compositional spans, with algorithms ensuring that attribution reflects true compositional structure rather than additive token effects.
  • Hierarchical task decomposition in robotics (Liu et al., 5 Jun 2025), aligning LLM-generated task trees with control primitives and spatial semantic maps.
  • Structured semantic priors in computer vision (Saini et al., 2019), which regularize local to global inferences using contextual hierarchies.

Across these domains, the unifying principle is to enforce and exploit explicit semantic structure across levels—either as inductive bias in training, generative prior in output, post-hoc explanation scaffold, or evaluation metric—bridging gaps between unstructured generations and human-understandable, actionable representations.

7. Summary Table: Core Components of Ontology-Grounded Hierarchical Semantic Decomposition

Component Description Canonical Instance
Ontology O\mathcal{O} Directed graph of concepts and semantic relations ESCO, ROME
Decomposition Method LLM-generative (ZS/FS) prompt-based generation Few-shot with depth-matched exemplars
Output Normalization Syntactic cleaning and semantic clustering via embeddings Sentence-BERT
Alignment Cosine-matching to ontology descendants with threshold τ\tau τ=0.78\tau=0.78
Metrics Semantic F1_1, hierarchy-aware F1_1 (Hungarian assignment) See Section 3
Evaluation Data Benchmark with 5–12 child nodes per parent, varied depth ROME-ESCO-DecompSkill
Latency Range Model- and prompt-dependent: 4.7s–169s (per instance) Llama4 Scout, DeepSeek, etc.

Hierarchical semantic decomposition, as embodied in contemporary LLM and ontology research (Luyen et al., 13 Oct 2025), is a critical methodological advance for robust, interpretable, and actionable decomposition of complex concepts, bridging the granularity gap between high-level abstraction and low-level operationalization across cognitive and computational disciplines.

Slide Deck Streamline Icon: https://streamlinehq.com

Whiteboard

Forward Email Streamline Icon: https://streamlinehq.com

Follow Topic

Get notified by email when new papers are published related to Hierarchical Semantic Decomposition.