KStar Diffuser: A Kinematic Diffusion Framework

Updated 2 February 2026

KStar Diffuser is a kinematics-enhanced conditional diffusion framework that employs dynamic spatial-temporal graph encoding to generate feasible bimanual manipulation trajectories.
It leverages a denoising diffusion probabilistic model and a differentiable forward kinematics module to substantially reduce self-collision and inverse kinematics failures.
Empirical evaluations demonstrate that KStar Diffuser markedly increases manipulation success rates and safety in both simulation and real-world settings.

KStar Diffuser is a conditional diffusion policy framework for bimanual robotic manipulation that integrates spatial-temporal graph structure encoding with differentiable kinematic constraints. This system addresses the shortcomings of previous end-to-end imitation learning approaches for bimanual robots, which lack explicit modeling of the robot’s joint interrelations and kinematic feasibility. KStar Diffuser expressly incorporates robot structure by constructing a dynamic spatial-temporal graph and enforces kinematics awareness through an optimizable forward kinematics module, resulting in substantially higher manipulation success rates and dramatically reduced self-collision and inverse kinematic failures in both simulation and real-world evaluation settings (Lv et al., 13 Mar 2025). The name "KStar" refers to Kinematics-enhanced Spatial-TemporAl gRaph Diffuser.

1. Diffusion Policy Architecture

KStar Diffuser formulates robot action generation as conditional score-based sampling using a denoising diffusion probabilistic model (DDPM), conditioned on high-dimensional multimodal observations and graph-structured robot state.

Let $x_0 \in \mathbb{R}^d$ represent the clean bimanual keyframe end-effector poses; $o$ denotes the conditioning observation, e.g., multiview RGB-D, language.
The forward (noising) process applies a Markov chain with a variance schedule $\{{\beta}_1, ..., {\beta}_T\}$ :

$q(x_t|x_{t-1}) = \mathcal{N}(x_t; \sqrt{\alpha_t} x_{t-1}, \beta_t I)$

where $\alpha_t = 1-\beta_t$ , and $\overline{\alpha}_t = \prod_{s=1}^t \alpha_s$ .

Marginally:

$q(x_t|x_0) = \mathcal{N}(x_t; \sqrt{\overline{\alpha}_t} x_0, (1 - \overline{\alpha}_t) I)$

The reverse (denoising) process samples $x_{t-1}$ by predicting added noise via a neural network $\epsilon_\theta$ :

$p_\theta(x_{t-1}|x_t, o) = \mathcal{N}\big(x_{t-1}; \mu_\theta(x_t, t, o), \beta_t I\big)$

with

$\mu_\theta(x_t, t, o) = \frac{1}{\sqrt{\alpha_t}}\Big(x_t - \frac{\beta_t}{\sqrt{1-\overline{\alpha}_t}} \epsilon_\theta(x_t, t, o)\Big)$

Training uses the simplified DDPM loss:

$\mathcal{L}_{\text{diff}} = \mathbb{E}_{x_0, t, \epsilon}\Big[\|\epsilon - \epsilon_\theta(x_t, t, o)\|^2\Big]$

This approach allows for sampling of kinematically feasible action trajectories, with the conditioning embedding detailed below.

2. Dynamic Spatial–Temporal Graph Representation

To incorporate the robot's physical structure, KStar Diffuser encodes the system as a dynamic spatial-temporal (ST) graph, capturing interaction patterns at both spatial and temporal levels:

The spatial graph $G_S = (V_S, E_S)$ is parsed from the robot’s URDF, with nodes $v_i$ for each joint and edges for physically linked joints.
Node features for each $v_i$ $v_{i}$ combine:
- Joint workspace-normalized coordinates $f_i^{JC} = (x_i, y_i, z_i)$
- Pairwise joint-to-joint distances $f_i^{JD} = (\|v_i-v_j\|)_{j=1\ldots m}$
- Body side one-hot label $f_i^{BL} \in \{\text{left}, \text{right}\}$
The temporal graph $G_{ST} = (V_{ST}, E_{ST})$ stacks $T$ past timesteps, with temporal edges added for each joint across time.
The composite graph is processed by $L$ GCN layers to yield per-node and pooled global graph embeddings $g \in \mathbb{R}^h$ .
The denoiser $\epsilon_\theta$ then receives visual-language features $H_B$ , the spatial-temporal graph feature $H_G$ , and the kinematic reference $H_R$ as conditioning, i.e.,

$p_\theta(x_{t-1}|x_t, H_B, H_G, H_R)$

This explicit structural encoding enables collision avoidance and models coordination constraints inherent to bimanual systems.

3. Differentiable Kinematic Modeling

KStar Diffuser integrates a differentiable forward kinematics module to regularize the action outputs with respect to physical feasibility:

Let $\theta = (\theta_1, ..., \theta_n)$ represent joint angles; the forward kinematics mapping is

$\mathrm{FK}(\theta) = T_1(\theta_1) T_2(\theta_2) \cdots T_n(\theta_n) \in SE(3)$

The network projects from concatenated backbone and graph features to a predicted joint configuration $\hat{y} = \text{Proj}([H_B, H_G])$ and computes $H_R = FK(\hat{y})$ .
The kinematic loss is

$\mathcal{L}_{\text{kin}} = \mathbb{E}_{\theta_0} \Big[ \|\theta_0 - \hat{y}\|^2 \Big]$

where $\theta_0$ is the ground-truth joint trajectory, enforcing joint-level accuracy and forward kinematic validity.

Gradients propagate via the kinematic Jacobian $J(\theta) = \partial FK(\theta)/\partial\theta \in \mathbb{R}^{6 \times n}$ , ensuring the model learns to produce kinematically valid and physically executable trajectories compatible with the robot's structure.

4. Training and Optimization Strategy

The training objective linearly combines the diffusion and kinematic losses:

$\mathcal{L} = \lambda \mathcal{L}_{\text{diff}} + (1-\lambda) \mathcal{L}_{\text{kin}}$

where $\lambda=0.9$ is empirically chosen for maximal imitation performance. The diffusion process uses $T=100$ steps, a single denoising iteration per batch, and optimizer AdamW (learning rate $2 \times 10^{-4}$ , weight decay $1 \times 10^{-6}$ , batch size 64, for 150k steps). The backbone encoders for vision and language features operate jointly with the spatial-temporal GCN and kinematics head, supporting end-to-end learning and inference (Lv et al., 13 Mar 2025).

5. Empirical Evaluation and Comparative Results

KStar Diffuser demonstrates markedly increased performance and safety over prior methods such as DP-EE and PerAct2:

Setting	# Demonstrations	KStar Success	Best Baseline (DP-EE)	PerAct2
RLBench2 Simulation	20	58.0%	~17.0%	(not reported)
RLBench2 Simulation	100	68.2%	~40.5%	<35%
Real-world (2 tasks)	15 trials	43.1%	N/A	29.9%
Handover_item (sim)	--	23.4%	N/A	--
(- kinematics) ablation	--	16.8%	N/A	--
(-ST graph+kin.) ablation	--	14.8%	N/A	--

Additional qualitative outcomes:

Self-collisions and inter-arm collisions approach zero with the spatial-temporal graph, compared to approximately 15% collision rate in PerAct2.
Kinematic infeasibility (IK solver failure) below 5% for KStar Diffuser, versus around 20% for diffusion models lacking kinematic loss.

These results indicate substantial (>20 percentage point) improvements in manipulation success rate, combined with dramatically improved safety and feasibility characteristics.

6. Implications and Distinctions

KStar Diffuser’s integration of explicit robot spatial-temporal modeling and kinematic regularization addresses the two main pathologies of prior imitation learning systems for bimanual manipulation: neglect of the robot’s joint dependencies (leading to high collision/interference rates), and generation of infeasible action trajectories (yielding high inverse-kinematics solver failure rates) (Lv et al., 13 Mar 2025). A plausible implication is that the framework can generalize to other multi-limbed and high-DOF robotic systems where structural coordination and feasibility are critical to task performance.

KStar Diffuser departs from prior diffusion-based and transformer-based policy architectures for manipulation in its coupling of dynamic graph encodings, physical robot model constraints, and multimodal observation conditioning, yielding both empirical robustness and improved physical realizability in action policy synthesis.

Markdown Report Issue Upgrade to Chat

References (1)

Spatial-Temporal Graph Diffusion Policy with Kinematic Modeling for Bimanual Robotic Manipulation (2025)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to KStar Diffuser.