DynaMimicGen: Dynamic Data for Robot Learning
- DynaMimicGen is a scalable framework that generates large, diverse imitation datasets from few human demonstrations for dynamic robot manipulation tasks.
- It leverages dynamic movement primitives and automated subtask segmentation to produce robust, task-consistent trajectories even under real-time scene perturbations.
- Performance evaluations across tasks such as stacking, nut insertion, and mug placement demonstrate its superior data augmentation and policy training capabilities.
DynaMimicGen is a scalable, real-time data generation framework for efficient robot policy learning in dynamic manipulation environments. It enables the synthesis of large, high-diversity imitation learning datasets from a minimal set of human demonstrations, explicitly supporting adaptation to dynamic task settings. Utilizing Dynamic Movement Primitives (DMPs) and automated subtask segmentation, DynaMimicGen produces robust, task-consistent trajectory data that generalizes across variable scene layouts, object instances, and robot configurations, with validated advantages in both simulation and real-world robotic applications (Pomponi et al., 20 Nov 2025).
1. Problem Scope and Pipeline Overview
DynaMimicGen ("D-MG") is designed to address the data bottleneck in training robust manipulation policies for contact-rich and long-horizon robot tasks, where collecting large-scale, human-curated demonstrations is costly or impractical. D-MG operates as follows:
- Given a small dataset (1 or 2 human demonstrations) for a manipulation task , modeled as an MDP with state space and action space (absolute end-effector pose and binary gripper control), D-MG generates a large, diverse dataset suitable for imitation policy training.
- The pipeline: (1) Collection of a few demonstrations, (2) Automatic segmentation into object-centric subtasks, (3) Fitting a DMP to each subtask, (4) Generating new rollouts for varied scene configurations by dynamically adapting DMPs at execution time (object pose, scene geometry, robot initial state), (5) Real-time rollout abortion if subtask constraints are violated.
This approach enables data augmentation far beyond the scale of the original demonstrations, supporting both static and dynamic (mid-execution scene perturbation) settings.
2. Demonstration Segmentation and Subtask Decomposition
DynaMimicGen assumes task structure decomposes into a fixed, ordered sequence of object-centric subtasks . In simulation, subtask boundaries are detected through event signals (e.g., gripper closure for grasp completion, collision/contact metrics for placement), whereas in the real world, boundaries can be manually annotated or inferred via simple heuristics such as joint position thresholds.
Segmentation Algorithm:
1 2 3 4 5 6 |
Input: τ = {s_t, a_t : t=1…T}, subtask order [o₁,…,o_M]
Output: τ₁…τ_M
t₀ = 1
for i in 1…M:
find t_i > t_{i-1} such that event_i(s_{t_i}, a_{t_i}) == true
τ_i = τ[t_{i-1} : t_i] |
This process produces trajectory segments each mapped to a single object manipulation primitive.
3. Dynamic Movement Primitives: Parameterization and Adaptation
For each subtask, DynaMimicGen fits a DMP to the demonstrated trajectory segment. The DMP formulation, based on transformation and canonical systems, facilitates trajectory modulation toward new goals and scene variations:
- Canonical system (phase variable ):
- Transformation system (Cartesian ):
- Forcing function:
Learning proceeds via Locally Weighted Regression, optimizing the match between demo forcing terms and basis-weighted activations.
Adaptation for New Scenes: At each rollout, DMPs are recomputed for current end-effector pose and a new goal , derived by re-projecting the demonstrated end-effector-to-object transform into the new object pose. Temporal scaling may be adapted to relative distances, and the learned DMP weights are reused to preserve trajectory shape.
4. Real-Time Generation and On-the-Fly Adaptation
At execution, DynaMimicGen operates in real time, adapting to dynamic object poses at each control tick (20 Hz):
- Sense the current object pose .
- Update target .
- Step the DMP integrator toward updated .
- Emit Cartesian command and accompanying gripper action.
- Repeat until DMP phase variable .
This mechanism allows the policy to "bend" its trajectory in response to perturbations, enabling robust imitation even under unpredictable changes.
5. Dataset Synthesis and Diversity
DynaMimicGen generates 1000 successful rollouts per task variant and demonstration count, spanning various scene layouts, object perturbations, and robot initializations. Example domains include:
- Stack: Block stacking in different workspace sizes (e.g., m to m).
- Square: Nut/peg insertion in boxes with varied size/orientation.
- MugCleanup: Drawer and mug placement tasks over multiple scene arrangements.
Dynamic scenarios introduce controlled object pose perturbations during execution. Robot configurations include simulated Sawyer/Franka arms and real-world Franka Panda executions.
6. Imitation Policy Training and Evaluation
Policies are trained via both behavior cloning (BC-RNN, as in RoboMimic) and Diffusion Policy (DP):
- State/observation spaces: Image-based (dual 84 × 84 RGB cameras) and low-dimensional (end-effector pose, gripper, ground-truth object pose).
- Action space: Absolute end-effector pose (normalized) and gripper.
- Training regime: BC is trained up to 600 epochs (LR or ), DP up to 2000 epochs. Early stopping is based on success rate.
- Evaluation: Maximum checkpointed success over 50 rollouts, averaged across 3 random seeds.
Performance results:
| Task | Method | D0 (%) | D1 (%) | D2 (%) |
|---|---|---|---|---|
| Stack | DP-D-MG | 83.3 | 70.7 | – |
| DP-MG | 74.0 | 65.3 | – | |
| BC-D-MG | 76.0 | 71.3 | – | |
| BC-MG | 72.7 | 58.0 | – | |
| Square | DP-D-MG | 86.7 | 46.7 | 23.3 |
| DP-MG | 75.3 | 24.7 | 12.0 | |
| BC-D-MG | 97.3 | 50.7 | 37.3 | |
| BC-MG | 88.7 | 44.0 | 34.0 | |
| MugCleanup | DP-D-MG | 90.0 | 60.7 | – |
| DP-MG | 61.3 | 38.0 | – | |
| BC-D-MG | 79.3 | 58.7 | – | |
| BC-MG | 65.3 | 31.3 | – |
Data generation rates (DGR) and policy successes demonstrate that DynaMimicGen exceeds or matches prior baselines (MG) in static settings and is uniquely effective in dynamic environments.
7. Failure Modes, Limitations, and Extensions
Identified limitations include:
- Assumption of a known, fixed sequence of object-centric subtasks.
- Requirement for accurate, real-time ground-truth object poses; sensitivity to sensor noise or occlusions.
- Handling of only one reference object per subtask; lack of multi-object or explicit obstacle consideration.
- Fixed DMP durations can be limiting under late perturbations.
- Focus on single-arm setups; bimanual coordination not supported.
- Real-world policy generalization remains limited by dataset size and perception system calibration.
- Sim-to-real robustness and large-scale visual diversity require further domain randomization.
Future directions include multi-object DMP extensions, integration with learned visual perception modules for markerless operation, bimanual and multi-agent generalization, enhanced sim-to-real transfer (domain/dynamics randomization), and incorporation of foundation or LLMs for automatic subtask extraction.
References
- "DynaMimicGen: A Data Generation Framework for Robot Learning of Dynamic Tasks" (Pomponi et al., 20 Nov 2025)