Papers
Topics
Authors
Recent
Search
2000 character limit reached

Generative Skill Chaining: Diffusion-Based Planning

Updated 26 March 2026
  • Generative Skill Chaining (GSC) is a probabilistic framework that uses diffusion generative models to learn parameterized skills for long-horizon manipulation tasks.
  • It composes independently trained skill priors via a product-of-experts approach, integrating classifier-based constraint guidance for efficient, constraint-aware planning.
  • GSC demonstrates improved success rates and sample efficiency in both simulated and real-world manipulation tasks, validated with robust diffusion model architectures.

Generative Skill Chaining (GSC) is a probabilistic framework for long-horizon manipulation planning that addresses the synthesis of complex action sequences from a learned library of parameterized skills. Unlike traditional search-based or greedy skill chaining, GSC composes skill-centric diffusion models in parallel to enable efficient, constraint-aware plan generation across previously unseen task instances. GSC prioritizes generalizability and sample efficiency by combining the expressivity of diffusion generative modeling with principled probabilistic inference and classifier-based constraint guidance (Mishra et al., 2023).

1. Formalization of the Long-Horizon Skill Planning Problem

GSC operates in a continuous state space SRn\mathcal S\subseteq\mathbb R^n, where a state sSs\in\mathcal S encodes the configuration of robotic agents and objects, typically with 6-DoF poses for each rigid body. The agent is endowed with a finite library Π={π1,,πM}\Pi=\{\pi_1,\dots,\pi_M\} of parameterized skills, each mapping a continuous parameter aAπRdπa\in\mathcal A_\pi\subseteq\mathbb R^{d_\pi} and a current state ss into a stochastic transition sTπ(ss,a)s'\sim\mathcal T_\pi(s' \mid s,a).

Given at test time:

  • s(0)s^{(0)}: initial state,
  • GG: goal predicate (e.g., G(s(K))=1G(s^{(K)})=1),
  • Φ=[(π1,o1),...,(πK,oK)]\Phi = [(\pi_1, o_1), ..., (\pi_K, o_K)]: a “skeleton” specifying the ordered instantiation of KK skills and associated objects,
  • Ψ={hj({s(i),a(i)}iIj)0}j=1J\Psi = \{h_j(\{s^{(i)},a^{(i)}\}_{i\in I_j})\ge0\}_{j=1}^J: differentiable geometric or logical constraints.

The planning objective is to produce a sequence of parameters (a(0),...,a(K1))(a^{(0)}, ..., a^{(K-1)}) and corresponding states (s(1),...,s(K))(s^{(1)}, ..., s^{(K)}) satisfying:

  • i:s(i+1)=Tπi+1(s(i),a(i))\forall i: s^{(i+1)} = \mathcal T_{\pi_{i+1}}(s^{(i)},a^{(i)}),
  • G(s(K))=1G(s^{(K)})=1,
  • hj()0,jh_j(\cdot)\ge0, \forall j.

GSC assumes the symbolic skeleton Φ\Phi is given, aiming to efficiently infer the continuous parameters by leveraging diffusion-based skill priors.

2. GSC Framework Overview

GSC is structured in two primary phases:

A. Offline Skill-Prior Training (per skill):

  • For each skill πΠ\pi\in\Pi, collect Dπ={(st,at,st)}t=1Nπ\mathcal D_\pi = \{(s_t, a_t, s'_t)\}_{t=1}^{N_\pi} (about 5,000 samples per skill) from expert or exploration-driven demonstrations.
  • Train an unconditional diffusion model over concatenated triplets x=(s,a,s)Rn+dπ+nx = (s, a, s') \in \mathbb R^{n + d_\pi + n}.
  • Learn a time-indexed score network ϵπ(xt,t)σtxtlogqπ,t(xt)\epsilon_\pi(x_t, t) \approx \sigma_t \nabla_{x_t}\log q_{\pi,t}(x_t) via denoising score matching.

B. Test-Time Chaining and Inference:

  • Given (s(0),Φ,Ψ)(s^{(0)}, \Phi, \Psi), define a joint block vector X=[s(0),a(0),s(1),a(1),...,a(K1),s(K)]X = [s^{(0)}, a^{(0)}, s^{(1)}, a^{(1)}, ..., a^{(K-1)}, s^{(K)}].
  • Formulate sampling from the composed skill priors using a product-of-experts approach, integrating each independently-trained skill score and adding differentiable constraint gradients.
  • Execute a parallel reverse diffusion chain of length TT over XX to sample a candidate trajectory.
  • Optionally leverage a skill-success predictor Q(s(i),s(i+1))Q(s^{(i)}, s^{(i+1)}) for post-sampling validation and local replanning.

3. Mathematical Foundations

3.1. Diffusion Model for Skill Priors:

For each skill, the forward SDE on data triplets x0=(s,a,s)x_0=(s, a, s') is: dx=2σ˙tσtxlogqt(x)dt+2σ˙tσtdwdx = -2\dot{\sigma}_t \sigma_t \nabla_x \log q_t(x)\,dt + \sqrt{2\dot{\sigma}_t \sigma_t}\,dw The score network ϵπ(xt,t)\epsilon_\pi(x_t, t) is trained to minimize: Lπ=Ex0DπEtσtxtlogqπ,t(xtx0)ϵπ(xt,t)2\mathcal L_{\pi} = \mathbb E_{x_0\sim \mathcal D_\pi}\mathbb E_t \left\| \sigma_t \nabla_{x_t}\log q_{\pi,t}(x_t|x_0) - \epsilon_\pi(x_t, t)\right\|^2 During inference, denoising is performed as: x~0=xt+σt2xtlogqπ,t(xt)xt+σtϵπ(xt,t)\tilde x_0 = x_t + \sigma_t^2 \nabla_{x_t}\log q_{\pi,t}(x_t) \approx x_t + \sigma_t \epsilon_\pi(x_t, t) followed by reverse sampling.

3.2. Skill Chaining via Product-of-Experts:

Let Xt=(st(0),at(0),...,st(K))X_t=(s_t^{(0)}, a_t^{(0)}, ..., s_t^{(K)}). The joint, unnormalized density is

p(X)[i=1Kqπi(s(i1),a(i1),s(i))][i=1K1qπi(s(i))]exp[jαjloghj(X)]p(X) \propto \left[ \prod_{i=1}^K q_{\pi_i}(s^{(i-1)}, a^{(i-1)}, s^{(i)}) \right] \left[\prod_{i=1}^K \frac{1}{q_{\pi_i}(s^{(i)})}\right] \exp\left[\sum_j \alpha_j \log h_j(X)\right]

The composite score for each reverse step tt is

Score(Xt,t)=i=1K[ϵπi(st(i1),at(i1),st(i),t)γiϵπi(st(i),t)]+j=1JαjXtloghj(Xt)\text{Score}(X_t, t) = \sum_{i=1}^K \left[ \epsilon_{\pi_i}(s_t^{(i-1)}, a_t^{(i-1)}, s_t^{(i)}, t) - \gamma_i\, \epsilon_{\pi_i}(s_t^{(i)}, t)\right] + \sum_{j=1}^J \alpha_j \nabla_{X_t}\log h_j(X_t)

γi[0,1]\gamma_i\in[0,1] balances forward and backward consistency at each skill interface; γi=0.5\gamma_i=0.5 is typical.

3.3. Constraint Handling by Classifier Guidance:

For a soft constraint h(X)h(X), classifier guidance augments the overall score as

ϵΨ(Xt,t)Xtlogh(X~0)\epsilon_\Psi(X_t, t) \propto \nabla_{X_t} \log h(\tilde X_0)

and is weighted by a hyperparameter α\alpha.

4. Inference Algorithm and Planning Procedure

GSC's inference executes a single parallel reverse diffusion chain to sample full KK-step plans. The process is as follows:

1
2
3
4
5
6
7
8
9
Input: s0, Φ=(π_1,...,π_K), {ε_{π_i}}, {h_j, α_j}, T, {σ_t}, {γ_i}
Initialize: sample X_T ~ N(0, I)
for t = T down to 1:
    decompose X_t into (s_t^{(0)}, a_t^{(0)}, ..., s_t^{(K)})
    S_t = sum_i [ε_{π_i}(s_t^{(i-1)}, a_t^{(i-1)}, s_t^{(i)}, t) - γ_i * ε_{π_i}(s_t^{(i)}, t)]
          + sum_j α_j * _{X_t} log h_j(tilde_X_0)
    tilde_X_0 = X_t + σ_t * S_t
    X_{t-1} ~ N(tilde_X_0, σ_{t-1}^2 I)
Return X_0 = (s^{(0)}, a^{(0)}, ..., s^{(K)})

Post-processing involves validating sampled trajectories with an auxiliary skill-success predictor and, if a step fails, replanning from that step onward. This approach enables robust handling of perturbations and modular replanning.

5. Skill Prior Training and Model Architecture

For each skill π\pi, training uses Nπ5000N_\pi\approx5000 successful transitions (s,a,s)(s, a, s'). Skill score networks adopt a DiT-style (transformer-based) architecture with:

  • Hidden size 128,
  • 4 blocks,
  • 4 attention heads,
  • MLP ratio 4,
  • Dropout 0.1.

Loss is the standard denoising score-matching objective. Training runs approximately 100 epochs with Adam optimizer and learning rate 10410^{-4}.

6. Experimental Evaluation and Comparative Analysis

GSC was evaluated on three long-horizon manipulation task suites: Hook Reach, Constrained Packing, and Rearrangement Push, with skeleton lengths 4–8. Each configuration was tested over 100 randomized environments. Baseline comparisons included:

  • Random CEM (uniform prior),
  • STAP (policy-CEM with learned prior),
  • DAF (generalization-oriented skill chaining).

Performance metrics indicate GSC achieves success rates comparable or superior to baselines, with strictly lower search cost and no per-task retraining. For example, in Rearrangement Push Task 2 (6-step skeleton), success rates were:

  • Random CEM: 0.10,
  • STAP: 0.52,
  • GSC: 0.60.

Ablation experiments on constraint guidance demonstrated that adding task-specific constraints (e.g., maximizing inter-place pose separation) improved success from 0.50–0.80 (no guidance) to 1.00 on 2 out of 3 packing tasks.

Real-world hardware validation was performed on a Franka Panda arm with RealSense depth sensing and 6-DoF scene estimation via AprilTags. Plans generated by GSC in simulation were executed open-loop, and GSC demonstrated the capacity to replan in response to small object pose perturbations.

7. Strengths, Limitations, and Potential Extensions

Strengths:

  • Multi-modal generative capacity enables sampling of diverse, plausible skill transitions.
  • Compositionality: generalizes to arbitrary-length skill chains without retraining on full plans.
  • Parallel inference: joint sampling over KK steps mitigates exponential search explosion.
  • Flexibility: constraint guidance enables task-constrained and geometry-aware planning at inference.

Limitations:

  • Assumes a given skeleton Φ\Phi; symbolic skeleton generation (full TAMP) is out of scope.
  • Requires complete, low-dimensional state observability.
  • Depends on curated demonstration data for every primitive; does not support unsupervised skill discovery.
  • Trade-off between solution quality and inference time set by the number of diffusion steps.

Proposed Extensions:

  • Learning joint distributions over skeletons and continuous actions via macro diffusion models.
  • End-to-end pixel-level diffusion models for vision-to-skill planning.
  • Online adaptation of skill priors from observed failures in deployment.
  • Accounting for dynamics uncertainty through stochastic reverse SDEs.

GSC introduces a new methodology for long-horizon skill planning by training per-skill diffusion-based priors and composing them via product-of-experts and classifier guidance, providing generalizable, efficient, and constraint-aware manipulation planning without combinatorial search (Mishra et al., 2023).

Definition Search Book Streamline Icon: https://streamlinehq.com
References (1)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Generative Skill Chaining (GSC).