Hierarchical Semantic Decomposition

Updated 16 November 2025

Hierarchical semantic decomposition is a process that breaks down complex concepts into hierarchically organized, semantically meaningful sub-components.
It employs LLM-based pipelines with prompt construction, normalization, and ontology alignment to ensure precision and depth accuracy.
Evaluation metrics like semantic F1 and hierarchy-aware F1 quantify both semantic fidelity and granularity adherence, driving methodological improvements.

Hierarchical semantic decomposition encompasses a family of methods that formalize, implement, and evaluate the breakdown of complex concepts, skills, or data entities into semantically meaningful and hierarchically organized components. These methods are critical for bridging the gap between coarse, high-level descriptions and actionable, fine-grained representations across domains such as skill evaluation, computer vision, natural language understanding, and model interpretability. This article presents a technical synthesis of hierarchical semantic decomposition with an emphasis on the formalization, methodology, evaluation, empirical results, and implications as articulated in ontology-grounded LLM skill decomposition (Luyen et al., 13 Oct 2025), followed by contextualization within the broader research landscape.

1. Formal Definition and Conceptual Foundations

Hierarchical semantic decomposition is defined by the process of systematically producing a set of sub-concepts or sub-tasks from a parent node such that:

Each child is a strictly more specific refinement of the parent.
The set of children is structurally aligned with a predefined hierarchy, typically represented as a directed, labeled ontology $\mathcal{O} = (\mathcal{V}, \mathcal{E}, \mathcal{R}, \ell)$ where nodes $v$ represent concepts (e.g., skills, competences), edges (e.g., hasSubSkill, skos:narrower) encode semantic relations, and $\ell(v)$ is a canonical surface form.

Let $S_0 \in \mathcal{V}$ be a parent (source) node. A hierarchical semantic decomposition process generates candidates $\hat{s}_1, ..., \hat{s}_k \in \Sigma^*$ such that each:

Satisfies strict specificity with respect to $S_0$
Is non-duplicative (uniqueness constraint)
Represents a valid node in the target ontology (type conformity)

Two operational regimes are distinguished:

Closed World: Only descendants within a fixed radius $d$ (e.g., direct children, $d=1$ ) of $S_0$ are permissible outputs.
Open World: Novel noun-phrase variants are allowed, subject to post-hoc alignment with the ontology.

Granularity is established via graph distance: direct children are at depth-1, grandchildren at depth-2, and so on. Accordingly, “correct” decompositions are those that identify true children at the correct depth, while deeper descendants may receive partial credit.

2. Methodology and Pipeline Construction

The end-to-end pipeline for hierarchical semantic decomposition in LLM-based skill decomposition consists of the following stages (Luyen et al., 13 Oct 2025):

Prompt Construction: Given $(S_0, C)$ —where $C$ encodes parameters such as the target number $k$ and language constraints—a deterministic prompt builder specifies the required output schema and, in few-shot mode, incorporates $k_{fs} \in \{2,3\}$ in-context exemplars $\mathcal{E}_{fs}$ from disjoint parents, matched by depth-proximity.
LLM Generation: The schema-constrained prompt is provided to the LLM, which decodes precisely $k$ candidate sub-skills.
Decoding-Time Checks: Lightweight filters are applied, including noun-phrase validation, parent repetition suppression, and type checks. Output lists failing these heuristics are post-processed.
Normalization and Deduplication: Surface forms are lowercased, whitespace-normalized, extraneous punctuation and hyphen artifacts are removed, and Sentence-BERT embeddings are clustered to eliminate paraphrastic duplicates.
Ontology Alignment ( $\alpha$ -mapping): Each candidate $\hat{s}$ is matched to a node $v$ in the descendant cone $\mathrm{Desc}(S_0) \subset \mathcal{V}$ by maximizing cosine similarity between E( $\hat{s}$ ) and E( $\ell(v)$ ), with a confidence threshold $\tau=0.78$ . Unverifiable outputs are assigned $\alpha(\hat{s}) = \bot$ .

Zero-shot (ZS) and leakage-safe few-shot (FS) decomposition differ exclusively in context: ZS uses only the instruction, while FS conditions on label-disjoint, granularity-matched exemplars.

3. Evaluation Metrics for Semantic and Hierarchy Fidelity

Two F $_1$ -style metrics are used to quantify decomposition quality:

a. Semantic F $_1$ Score

For a parent $u$ with model predictions $\hat{s}_1, ..., \hat{s}_p$ and gold children $g_1, ..., g_q$ :

Cosine similarity matrix $S_{ij} = \cos(E(\hat{s}_i), E(\ell(g_j)))$ is constructed.
The optimum bipartite matching $M_u$ (Hungarian algorithm) maximizes total similarity $\Sigma_u$ .
Precision, recall, and F $_1$ per parent are:

$P^{(u)}_{\mathrm{sem}} = \frac{\Sigma_u}{p} \qquad R^{(u)}_{\mathrm{sem}} = \frac{\Sigma_u}{q} \qquad F1^{(u)}_{\mathrm{sem}} = \frac{2P^{(u)}_{\mathrm{sem}} R^{(u)}_{\mathrm{sem}}}{P^{(u)}_{\mathrm{sem}} + R^{(u)}_{\mathrm{sem}} + \epsilon}$

Macro-average is reported as $\overline{F1}_{\mathrm{sem}}$ .

b. Hierarchy-Aware F $_1$ Score

Captures granularity fidelity using node placement:

Each valid match $(\hat{s}_i, g_j)$ $(\overset{s}{^}_{i}, g_{j})$ is credited:
- $1.0$ if $\alpha(\hat{s}_i) = g_j$ (direct child)
- $0.5$ if $\alpha(\hat{s}_i)$ is a deeper descendant
- $0$ otherwise
Matrix $H_{ij} = S_{ij} \cdot \mathrm{credit}(\hat{s}_i, g_j)$ is constructed; matching and F $_1$ -calculations parallel the semantic case.
Macro-averaged as $\overline{F1}_{\mathrm{hier}}$ .

These metrics together differentiate between content-accurate but mis-granular outputs versus those that are both semantically and structurally compliant.

4. Empirical Results and Benchmark Analysis

Evaluation is conducted on ROME-ESCO-DecompSkill, a curated subset of the ROME 4.0 and ESCO ontologies (French, “Skills and Competences” pillar). The benchmark restricts attention to 288 parents with 5–12 children, ensuring actionable decompositions at non-trivial granularity.

Seven LLMs are compared under both ZS and FS conditions. Key observations (all F $_1$ /Hier-F $_1$ macro-averaged):

Zero-shot achieves $F1 \approx 0.39$ –$0.49$, demonstrating that large pretrained LLMs encode robust prior knowledge of skill decomposition.
Hierarchy-aware F $_1$ is much lower for ZS ( $\approx 0.04$ –$0.12$). Few-shot consistently improves Hier-F $_1$ (e.g., DeepSeek V3 $0.0656 \rightarrow 0.0879$ , GPT-5 $0.0425 \rightarrow 0.0727$ ), and often boosts F $_1$ itself (K2 Instruct $0.4353 \rightarrow 0.4604$ ). Best observed: Llama4 Scout FS ( $F1=0.4902$ , Hier-F $_1=0.1399$ ).
Latency is model- and prompt-dependent: for some models, the use of FS prompts actually reduces wall time due to more schema-conformant output and earlier completion, offsetting the cost of longer prompts. Wall times range from 4.7s to 169s per instance; the trade-off is empirically nontrivial.

A representative decomposition example for the parent “Data analysis” demonstrates that FS prompts better suppress spurious or over-general outputs, accurately identifying depth-1 children and avoiding unverifiable instances.

5. Applicability, Implications, and Limitations

Treating the ontology strictly as an evaluation scaffold (not as retrieval or generation corpus) establishes a reproducible, leakage-controlled protocol for testing hierarchical semantic decomposition systems. The joint use of semantic and hierarchy-aware metrics enables rigorous diagnosis of two error types: content drift and granularity drift, which pure generative or ontology-constrained strategies alone cannot disentangle.

Few-shot prompting with depth-matched, label-disjoint exemplars (selected via graph heuristics) serves as an effective structural prior. This suppresses over-generalization, reduces output phrasing variance, and improves both alignment and depth placement—effects especially pronounced in mid-scale LLMs.

Limitations include:

Sensitivity of very large LLMs to exemplar selection, which can induce precision-recall trade-offs if exemplar depth diverges from the target.
The reliance on cosine similarity thresholds in open-world evaluation, which can misalign strong paraphrases.
The focus on a single language and ontology, raising questions of robustness across multilingual and multi-taxonomy settings.

Proposed future paths include retrieval-augmented generation with masked evidence, dynamic exemplar selection conditioned on graph properties, graph-constrained decoding for subtree-enforcement, and adaptation for multilingual or heterogeneous taxonomies.

Hierarchical semantic decomposition, as formalized here, shares conceptual similarities with:

Recursive and hierarchical decomposition for 3D shape segmentation and unsupervised part discovery (Yu et al., 2019, Paschalidou et al., 2020), where hierarchies encode part-of relations and are enforced via recursive neural networks or binary trees.
Hierarchical semantic segmentation in images (Li et al., 2022) via multi-label output heads and hierarchy consistency losses.
Hierarchical post-hoc explanation in NLP (Jin et al., 2019), where non-additive, context-independent importances are attributed to compositional spans, with algorithms ensuring that attribution reflects true compositional structure rather than additive token effects.
Hierarchical task decomposition in robotics (Liu et al., 5 Jun 2025), aligning LLM-generated task trees with control primitives and spatial semantic maps.
Structured semantic priors in computer vision (Saini et al., 2019), which regularize local to global inferences using contextual hierarchies.

Across these domains, the unifying principle is to enforce and exploit explicit semantic structure across levels—either as inductive bias in training, generative prior in output, post-hoc explanation scaffold, or evaluation metric—bridging gaps between unstructured generations and human-understandable, actionable representations.

7. Summary Table: Core Components of Ontology-Grounded Hierarchical Semantic Decomposition

Component	Description	Canonical Instance
Ontology $\mathcal{O}$	Directed graph of concepts and semantic relations	ESCO, ROME
Decomposition Method	LLM-generative (ZS/FS) prompt-based generation	Few-shot with depth-matched exemplars
Output Normalization	Syntactic cleaning and semantic clustering via embeddings	Sentence-BERT
Alignment	Cosine-matching to ontology descendants with threshold $\tau$	$\tau=0.78$
Metrics	Semantic F $_1$ , hierarchy-aware F $_1$ (Hungarian assignment)	See Section 3
Evaluation Data	Benchmark with 5–12 child nodes per parent, varied depth	ROME-ESCO-DecompSkill
Latency Range	Model- and prompt-dependent: 4.7s–169s (per instance)	Llama4 Scout, DeepSeek, etc.

Hierarchical semantic decomposition, as embodied in contemporary LLM and ontology research (Luyen et al., 13 Oct 2025), is a critical methodological advance for robust, interpretable, and actionable decomposition of complex concepts, bridging the granularity gap between high-level abstraction and low-level operationalization across cognitive and computational disciplines.