Hierarchical Skill Graphs Overview

Updated 2 April 2026

Hierarchical skill graphs are formal structures organizing both atomic and compound skills in directed, multi-level graphs for adaptive planning.
They are constructed using methods like graph partitioning, embedding-based learning, and trajectory distillation to optimize hierarchical control and skill transfer.
Their practical impact includes enhanced sample efficiency, robust performance in robotics, and scalability in multi-agent and dynamic task environments.

Hierarchical skill graphs are formal structures for organizing, composing, and adapting temporally extended skills (options, policies, or behaviors) in multi-level architectures. Their graph-based abstractions enable agents, robots, and artificial systems to plan, learn, generalize, and transfer competencies across tasks, domains, and representations. Nodes typically correspond to skills or skill modules at varying abstraction levels, and edges encode invocation, composition, semantic dependencies, or contextual applicability.

1. Formal Foundations and Structural Taxonomies

Hierarchical skill graphs are generalized directed (or sometimes acyclic) graphs $G = (V,E)$ , with nodes $V$ representing either atomic (“primitive”) or compound (“meta”, “option”, “module”) skills, and edges $E$ denoting possible invocations, compositional flows, or context-dependent transitions. The formalization ranges from simple DAGs (e.g., modular RL skill pipelines) to layered semantic knowledge graphs with additional entity and relation types.

Canonical variants include:

Option and Skill-Option Graphs: Each node is a temporally extended option (initiation set, internal policy, termination); edges represent “calls” or “expansions” when upper-level skills invoke subskills, producing a multi-level, often tree- or DAG-structured call graph (Evans et al., 2023, Konidaris, 2015).
Multi-entity Knowledge Graphs: Nodes may encode skills, environments, tasks, and their embeddings; directed edges capture contextual applicability, such as “skill $\sigma_i$ is applicable in environment $e_j$ ” (Zhu et al., 9 Jul 2025, Zhang et al., 2023).
Recursive/Layered Trees: Explicit ontologies or taxonomies with several abstraction tiers—for example, VerbNet-inspired action classes, verb instances, templated skill descriptions, and example trajectories—often constructed as rooted directed trees (Xie et al., 3 Mar 2026).
Coupled Graphs: Hybrid frameworks separate task, scene, and state graphs, each with domain-specific nodes and edges, with meta-relations (“require”, “obtain”) interlinking layers to mediate symbolic-to-physical execution (Qi et al., 2024).

Hierarchical relationships arise either via containment (parent meta-skills as composed of child atomic skills), precondition/effect logical links (sequential execution), or embedding-based similarity (for transfer and composition) (Xia et al., 9 Feb 2026, Mao et al., 2024).

2. Algorithmic Construction and Graph Learning

Hierarchical skill graphs are constructed via a variety of automated, data-driven procedures:

Graph Partitioning and Modularity Maximization: For MDP environments, the state-transition graph is clustered recursively (e.g., using Louvain modularity maximization), yielding nested partitions. Each partitioning level defines clusters, and inter-cluster transitions induce options (skills), which then form the nodes and edges of the multi-level hierarchy (Evans et al., 2023). Option policies are typically learned via macro-Q-learning, with higher-level options built by recursively calling lower-level sub-options.
Skill-Symbol Loop: Alternating phases of skill (option) discovery and representation (state abstraction) acquisition construct a sequence of progressively more abstract MDPs. At each phase, new options induce propositional symbols for initiation/effect sets, clustering the state space into abstract states and recursively constructing the hierarchical skill graph (Konidaris, 2015).
Embedding-based Knowledge Graph Construction: In multi-task/multi-agent RL, skills, tasks, and contexts are encoded into feature vectors via learned encoders. Typed edges (e.g., task→skill, environment→skill) are embedded (often using TransH or similar models) and scored for contextual applicability. Graph structure and node embeddings are optimized jointly via loss functions mixing positive/negative/soft triples and context similarity (Zhu et al., 9 Jul 2025, Zhang et al., 2023).
Trajectory and Experience Distillation: In language-model–driven agents, raw execution trajectories are distilled—via experience-based methods or an LLM teacher—into compact, transferable skill policies or heuristics. These are organized into hierarchical graphs by semantic category and relevance, supporting interpretability and adaptive retrieval (Xia et al., 9 Feb 2026).
Semantic Task Graph Extraction: Robotic skill libraries and manipulation planners hierarchically organize knowledge by parsing demonstrations, procedural videos, or language instructions into multi-level graphs (e.g., task graph, scene graph, state graph); subtask decompositions are performed by LLMs, and motion-level or physical adaptations are integrated using search algorithms and sensory-driven corrections (Qi et al., 2024, Mao et al., 2024, Xie et al., 3 Mar 2026).

3. Planning, Control, and Adaptation over Skill Graphs

Planning and execution in hierarchical skill graph frameworks exploit the explicitly encoded multi-level structure to realize efficient, robust, and generalizable policy synthesis:

Semantic-level and Skill-aware Planning: High-level task specifications (user instructions, subgoal sequences, abstract objectives) are mapped to paths through the skill graph (sequences of meta-skills), either via LLM-based decomposition or combinatorial search. Node preconditions/effects and context eligibility are checked against the world state at each step, supporting dynamic adaptation and fallbacks (Mao et al., 2024, Yu et al., 13 Mar 2026).
Recursive or Layered Execution: Higher-level skills expand recursively into sequences of lower-level options or primitives, with each lower-level skill executing its policy contingent upon local context. Execution may involve explicit call-stacks (option graphs), policy banks (for skill selection), or procedural composition operators (differentiable embedding fusion) (Evans et al., 2023, Sahni et al., 2017, Zhu et al., 9 Jul 2025).
Adaptive Retrieval and Fine-tuning: At inference, graph-embedding–based retrieval selects the most semantically or contextually pertinent skills for the current task/environment, with adaptive fine-tuning for distributional shifts (domain adaptation) or online learning when necessary (Zhu et al., 9 Jul 2025, Xia et al., 9 Feb 2026, Najar, 25 Jan 2026).
Closed-loop Improvement: Skill graphs facilitate continual learning by enabling systematic data logging (skill outcomes, context transitions), online updating of evaluators (success probabilities, cost functions), and dynamic expansion of the skill set (incorporation of new capabilities as new nodes) (Yu et al., 13 Mar 2026, Xie et al., 3 Mar 2026).

4. Empirical Results and Quantitative Evaluation

Empirical studies demonstrate the practical impact and advantages of hierarchical skill graphs across a range of domains:

Hierarchical RL and Modular Agents: Multi-level skill graph architectures constructed via modularity maximization in discrete/continuous MDPs yield substantial speedups, superior learning curves, and deeper hierarchies than flat or single-level baselines, with robust performance as state space scales (e.g., up to 8 levels at $10^6$ states in grid environments) (Evans et al., 2023).
Sample Efficiency and Generalization: Directed skill graphs in real-time complex environments (e.g., action RPGs) enable rapid learning of upstream (low-level) skills (e.g., camera, lock-on, movement: converging in $5$–$15$k steps), and support targeted adaptation of downstream modules (dodging, heal-attack), achieving win rates of $44\%$ (Phase 1) and $52\%$ (transfer) with order-of-magnitude lower data requirements than monolithic DQN baselines (Najar, 25 Jan 2026).
Multi-Task Multi-Agent and Zero-Shot Transfer: Graph-based high-level skill selectors attain $V$ 0 success on previously unseen multi-agent task/policy combinations, outperforming hierarchical MAPPO baselines (which degrade to $V$ 1– $V$ 2 as skill count grows). Fast fine-tuning via skill graph structure achieves a $V$ 3 sample efficiency improvement (Zhu et al., 9 Jul 2025).
Robustness in Physical Robotics: Hierarchical skill awareness (scene/task/state graphs, adaptive tactile parameterization) delivers $V$ 4 successful transfer of manipulation policies across reconfigured task instances—where end-to-end RL baselines fail (e.g., $V$ 5 on “drawer-to-door” manipulation) (Qi et al., 2024). Autonomous assembly via skill graphs achieves up to a $V$ 6 increase in assembly “survival length” and $V$ 7 build success across complex LEGO tasks (Yu et al., 13 Mar 2026).
Scalability: Skill graphs scale to hundreds of skills and large numbers of agents (e.g., 320 skills in RSG; 50-robot MARL scenarios), while maintaining compact storage and constant-time graph traversals (Zhang et al., 2023, Zhu et al., 9 Jul 2025).

5. Semantic, Symbolic, and Knowledge-Driven Extensions

Hierarchical skill graphs integrate semantic and symbolic reasoning into the skill-centric paradigm:

VerbNet-inspired and Semantic Taxonomies: Extraction of a four-level hierarchy (VerbNet classes, verbs, templated descriptions, video slices) from unstructured robotic videos supports few-shot skill inference and dynamic expansion as new skills are requested during online planning (Xie et al., 3 Mar 2026).
Task/Scene/State Graph Coupling: Cross-linking symbolic (task) and physical (scene) information through an explicit state-graph layer supports knowledge-grounded planning, dynamic subtask adaptation (via LLMs), and physical-level parameter updates (via tactile extraction and geometric heuristics) in unstructured robotic environments (Qi et al., 2024).
Skill-Aware Representation Abstractions: Skill–symbol loops explicitly tie abstracted state representations to the available skills at each level by clustering according to initiation/effect sets, yielding provably correct plan abstractions and supporting recursive value-function lifting (Konidaris, 2015).
Policy Embedding and Retrieval: Graph-based structures enable embedding-based scoring for context-selective skill retrieval, supporting zero-shot transfer, compositional skill mixing, and continual expansion via online discovered skills (Xia et al., 9 Feb 2026, Zhang et al., 2023).

6. Limitations, Open Questions, and Future Directions

Challenges remain in scaling, automation, and theoretical guarantees for hierarchical skill graph approaches:

Automated Discovery and Subgoal Induction: Identifying actionable, transferable skills and appropriate subgoal abstractions remains nontrivial in high-dimensional or continuous environments. Improvements in unsupervised option discovery and abstraction representations are needed (Evans et al., 2023, Konidaris, 2015).
Handling Unrelated Tasks and Open-World Diversity: Robustness to unrelated or adversarially distinct tasks requires continual skill graph expansion and semantic partitioning, as realized by dynamic skill graph models and incremental retrieval/fine-tuning protocols (Zhu et al., 9 Jul 2025, Xie et al., 3 Mar 2026).
Cross-Modal and Language-Guided Reasoning: Integrating LLM-based task decomposition and grounding into the skill-graph hierarchical control loop is nascent but crucial, especially for open-ended robotic manipulation and assembly (Mao et al., 2024, Qi et al., 2024, Xie et al., 3 Mar 2026).
Optimal Hierarchy Depth and Abstraction Selection: Determining the correct depth and construction of the hierarchy is problem-dependent; excessive depth may slow planning, while insufficient abstraction forfeits transfer gains (Konidaris, 2015).
Statistical and Theoretical Guarantees: While empirical successes are strong, formal bounds on the expressivity, sample complexity, or value approximation properties of learned hierarchical skill graphs outside tractable subgoal options remain incomplete (Evans et al., 2023, Konidaris, 2015).

A plausible implication is that future research will increasingly leverage a combination of graph-structured knowledge representation, generative model–based skill expansion, and multi-modal feedback to unify symbolic, sub-symbolic, and experiential skill learning in continually adaptive, interpretable, and scalable agent architectures.