InsActor: Instruction-driven Physics-based Characters (2312.17135v1)
Abstract: Generating animation of physics-based characters with intuitive control has long been a desirable task with numerous applications. However, generating physically simulated animations that reflect high-level human instructions remains a difficult problem due to the complexity of physical environments and the richness of human language. In this paper, we present InsActor, a principled generative framework that leverages recent advancements in diffusion-based human motion models to produce instruction-driven animations of physics-based characters. Our framework empowers InsActor to capture complex relationships between high-level human instructions and character motions by employing diffusion policies for flexibly conditioned motion planning. To overcome invalid states and infeasible state transitions in planned motions, InsActor discovers low-level skills and maps plans to latent skill sequences in a compact latent space. Extensive experiments demonstrate that InsActor achieves state-of-the-art results on various tasks, including instruction-driven motion generation and instruction-driven waypoint heading. Notably, the ability of InsActor to generate physically simulated animations using high-level human instructions makes it a valuable tool, particularly in executing long-horizon tasks with a rich set of instructions.
- Is conditional generative modeling all you need for decision-making? arXiv preprint arXiv:2211.15657, 2022.
- A stochastic conditioning scheme for diverse human motion prediction. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 5223–5232, 2020.
- Posetrack: A benchmark for human pose estimation and tracking. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 5167–5176, 2018.
- Drecon: Data-driven responsive control of physics-based characters. ACM Trans. Graph., 38(6), November 2019.
- Diffusion policy: Visuomotor policy learning via action diffusion. In Proceedings of Robotics: Science and Systems (RSS), 2023.
- Diffusion models beat gans on image synthesis. Advances in Neural Information Processing Systems, 34, 2021.
- Brax–a differentiable physics engine for large scale rigid body simulation. arXiv preprint arXiv:2106.13281, 2021.
- Action2motion: Conditioned generation of 3d human motions. In Proceedings of the 28th ACM International Conference on Multimedia, pages 2021–2029, 2020.
- Robust motion in-betweening. ACM Transactions on Graphics (TOG), 39(4):60–1, 2020.
- Moglow: Probabilistic and controllable motion synthesis using normalising flows. ACM Transactions on Graphics (TOG), 39(6):1–14, 2020.
- Denoising diffusion probabilistic models. Advances in Neural Information Processing Systems, 33:6840–6851, 2020.
- Generalizing motion edits with gaussian processes. ACM Transactions on Graphics (TOG), 28(1):1–12, 2009.
- Human3.6m: Large scale datasets and predictive methods for 3d human sensing in natural environments. IEEE transactions on pattern analysis and machine intelligence, 36(7):1325–1339, 2013.
- Planning with diffusion for flexible behavior synthesis. In International Conference on Machine Learning, 2022.
- Padl: Language-directed physics-based character control. Association for Computing Machinery, 2022.
- Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980, 2014.
- Guided learning of control graphs for physics-based characters. ACM Transactions on Graphics, 35(3), 2016.
- Smpl: A skinned multi-person linear model. ACM transactions on graphics (TOG), 34(6):1–16, 2015.
- Language conditioned imitation learning over unstructured data. Robotics: Science and Systems, 2021.
- Composing pick-and-place tasks by grounding language. In International Symposium on Experimental Robotics, 2021.
- Learning object placements for relational instructions by hallucinating scene representations. In 2020 IEEE International Conference on Robotics and Automation (ICRA), pages 94–100, 2020.
- Motion graphs++ a compact generative model for semantic motion analysis and synthesis. ACM Transactions on Graphics (TOG), 31(6):1–12, 2012.
- Representing cyclic human motion using functional analysis. Image and Vision Computing, 23(14):1264–1276, 2005.
- Imitating human behaviour with diffusion models. In The Eleventh International Conference on Learning Representations, 2023.
- Deepmimic: Example-guided deep reinforcement learning of physics-based character skills. ACM Trans. Graph., 37(4):143:1–143:14, July 2018.
- Deepmimic: Example-guided deep reinforcement learning of physics-based character skills. ACM Transactions On Graphics (TOG), 2018.
- The kit motion-language dataset. Big data, 4(4):236–252, 2016.
- Learning transferable visual models from natural language supervision. arXiv preprint arXiv:2103.00020, 2021.
- Learning transferable visual models from natural language supervision. CoRR, abs/2103.00020, 2021.
- Hierarchical text-conditional image generation with clip latents. arXiv preprint arXiv:2204.06125, 2022.
- Contact and human dynamics from monocular video. In Proceedings of the European Conference on Computer Vision (ECCV), 2020.
- Trace and pace: Controllable pedestrian animation via guided trajectory diffusion. In Conference on Computer Vision and Pattern Recognition (CVPR), 2023.
- Diffmimic: Efficient motion mimicking with differentiable physics. ICLR, 2022.
- Physcap: Physically plausible monocular 3d motion capture in real time. ACM Trans. Graph., 39(6), nov 2020.
- Interactive visual grounding of referring expressions for human-robot interaction. In Proceedings of Robotics: Science and Systems, 2018.
- Cliport: What and where pathways for robotic manipulation. In Proceedings of the 5th Conference on Robot Learning (CoRL), 2021.
- Denoising diffusion implicit models. arXiv preprint arXiv:2010.02502, 2020.
- Language-conditioned imitation learning for robot manipulation tasks. In Proceedings of the 34th International Conference on Neural Information Processing Systems, 2020.
- Human motion diffusion model. In The Eleventh International Conference on Learning Representations, 2023.
- Attention is all you need. Advances in neural information processing systems, 30, 2017.
- Mocapact: A multi-task dataset for simulated humanoid control. arXiv preprint arXiv:2208.07363, 2022.
- Learning diverse stochastic human-action generators by learning smooth latent transitions. In Proceedings of the AAAI conference on artificial intelligence, volume 34, pages 12281–12288, 2020.
- Mt-vae: Learning motion transformations to generate multimodal human dynamics. In Proceedings of the European conference on computer vision (ECCV), pages 265–281, 2018.
- Controlvae: Model-based learning of generative controllers for physics-based characters. 41(6), 2022.
- Physdiff: Physics-guided human motion diffusion model. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), 2023.
- Simpoe: Simulated character control for 3d human pose estimation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2021.
- Invigorate: Interactive visual grounding and grasping in clutter. Proceedings of Robotics: Science and Systems, abs/2108.11092, 2021.
- Motiondiffuse: Text-driven human motion generation with diffusion model. arXiv preprint arXiv:2208.15001, 2022.
- Jiawei Ren (33 papers)
- Mingyuan Zhang (41 papers)
- Cunjun Yu (22 papers)
- Xiao Ma (169 papers)
- Liang Pan (93 papers)
- Ziwei Liu (368 papers)