DexCap: Scalable and Portable Mocap Data Collection System for Dexterous Manipulation
Abstract: Imitation learning from human hand motion data presents a promising avenue for imbuing robots with human-like dexterity in real-world manipulation tasks. Despite this potential, substantial challenges persist, particularly with the portability of existing hand motion capture (mocap) systems and the complexity of translating mocap data into effective robotic policies. To tackle these issues, we introduce DexCap, a portable hand motion capture system, alongside DexIL, a novel imitation algorithm for training dexterous robot skills directly from human hand mocap data. DexCap offers precise, occlusion-resistant tracking of wrist and finger motions based on SLAM and electromagnetic field together with 3D observations of the environment. Utilizing this rich dataset, DexIL employs inverse kinematics and point cloud-based imitation learning to seamlessly replicate human actions with robot hands. Beyond direct learning from human motion, DexCap also offers an optional human-in-the-loop correction mechanism during policy rollouts to refine and further improve task performance. Through extensive evaluation across six challenging dexterous manipulation tasks, our approach not only demonstrates superior performance but also showcases the system's capability to effectively learn from in-the-wild mocap data, paving the way for future data collection methods in the pursuit of human-level robot dexterity. More details can be found at https://dex-cap.github.io
- A survey of robot learning from demonstration. Robotics and autonomous systems, 57(5):469–483, 2009.
- Holo-dex: Teaching dexterity with immersive mixed reality. In 2023 IEEE International Conference on Robotics and Automation (ICRA), pages 5962–5969. IEEE, 2023a.
- Dexterous imitation made easy: A learning-based framework for efficient dexterous manipulation. In 2023 ieee international conference on robotics and automation (icra), pages 5954–5961. IEEE, 2023b.
- Human-to-robot imitation in the wild. arXiv preprint arXiv:2207.09450, 2022.
- Dexterous manipulation using both palm and fingers. In 2014 IEEE International Conference on Robotics and Automation (ICRA), pages 1560–1565. IEEE, 2014.
- Towards generalizable zero-shot manipulation via translating human interaction plans. arXiv preprint arXiv:2312.00775, 2023.
- Robot programming by demonstration. In Springer handbook of robotics, pages 1371–1394. Springer, 2008.
- Learning and reproduction of gestures by imitation. IEEE Robotics & Automation Magazine, 17(2):44–54, 2010.
- Dexycb: A benchmark for capturing hand grasping of objects. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 9044–9053, 2021.
- A system for general in-hand object re-orientation. Conference on Robot Learning, 2021.
- Visual dexterity: In-hand dexterous manipulation from depth. arXiv preprint arXiv:2211.11744, 2022.
- Sequential dexterity: Chaining dexterous policies for long-horizon manipulation. arXiv preprint arXiv:2309.00987, 2023.
- Diffusion policy: Visuomotor policy learning via action diffusion. arXiv preprint arXiv:2303.04137, 2023.
- Ar2-d2: Training a robot without a robot. arXiv preprint arXiv:2306.13818, 2023.
- Learning manipulation skills from a single demonstration. The International Journal of Robotics Research, 37(1):137–154, 2018.
- Arctic: A dataset for dexterous bimanual hand-object manipulation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 12943–12954, 2023.
- A robotic hand-arm teleoperation system using human arm/hand with a novel data glove. In 2015 IEEE International Conference on Robotics and Biomimetics (ROBIO), pages 2483–2488. IEEE, 2015.
- One-shot visual imitation learning via meta-learning. In Conference on robot learning, pages 357–368. PMLR, 2017.
- Self-supervised correspondence in visuomotor policy learning. IEEE Robotics and Automation Letters, 5(2):492–499, 2019.
- First-person tele-operation of a humanoid robot. In 2015 IEEE-RAS 15th International Conference on Humanoid Robots (Humanoids), pages 997–1002. IEEE, 2015.
- Stabilize to act: Learning to coordinate for bimanual manipulation. In Conference on Robot Learning, pages 563–576. PMLR, 2023.
- Rt-trajectory: Robotic task generalization via hindsight trajectory sketches. arXiv preprint arXiv:2311.01977, 2023.
- Reset-free reinforcement learning via multi-task learning: Learning dexterous manipulation behaviors without human intervention. In ICRA, pages 6664–6671. IEEE, 2021.
- Dexterity from touch: Self-supervised pre-training of tactile representations with robotic play. arXiv preprint arXiv:2303.12076, 2023.
- Honnotate: A method for 3d annotation of hand and object poses. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 3196–3206, 2020.
- Megatrack: monochrome egocentric articulated hand-tracking for virtual reality. ACM Transactions on Graphics (ToG), 39(4):87–1, 2020.
- Dexpilot: Vision-based teleoperation of dexterous robotic hand-arm system. In 2020 IEEE International Conference on Robotics and Automation (ICRA), pages 9164–9170. IEEE, 2020.
- Dextreme: Transfer of agile in-hand manipulation from simulation to reality. arXiv preprint arXiv:2210.13702, 2022.
- Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 770–778, 2016.
- Denoising diffusion probabilistic models. Advances in neural information processing systems, 33:6840–6851, 2020.
- Dynamic handover: Throw and catch with bimanual hands. arXiv preprint arXiv:2309.05655, 2023.
- Deep inertial poser: Learning to reconstruct human pose from sparse inertial measurements in real time. ACM Transactions on Graphics (TOG), 37(6):1–15, 2018.
- Imitation learning: A survey of learning methods. ACM Computing Surveys (CSUR), 50(2):1–35, 2017.
- Movement imitation with nonlinear dynamical systems in humanoid robots. In Proceedings 2002 IEEE International Conference on Robotics and Automation (Cat. No.02CH37292), volume 2, pages 1398–1403 vol.2, 2002. doi: 10.1109/ROBOT.2002.1014739.
- Perceiver: General perception with iterative attention. arXiv preprint arXiv: Arxiv-2103.03206, 2021.
- Transformer inertial poser: Real-time human motion reconstruction from sparse imus with simultaneous terrain generation. In SIGGRAPH Asia 2022 Conference Papers, pages 1–9, 2022.
- Sampling-based Exploration for Reinforcement Learning of Dexterous Manipulation. In Proceedings of Robotics: Science and Systems, Daegu, Republic of Korea, July 2023. doi: 10.15607/RSS.2023.XIX.020.
- Oussama Khatib. A unified approach for motion and force control of robot manipulators: The operational space formulation. IEEE Journal on Robotics and Automation, 3(1):43–53, 1987.
- Learning motor primitives for robotics. In 2009 IEEE International Conference on Robotics and Automation, pages 2112–2118. IEEE, 2009.
- Imitation and reinforcement learning. IEEE Robotics & Automation Magazine, 17(2):55–62, 2010.
- Real-time behaviour synthesis for dynamic hand-manipulation. In 2014 IEEE International Conference on Robotics and Automation (ICRA), pages 6808–6815. IEEE, 2014.
- Set transformer: A framework for attention-based permutation-invariant neural networks. arXiv preprint arXiv: Arxiv-1810.00825, 2018.
- Twisting lids off with two hands. arXiv:2403.02338, 2024.
- Robot learning on the job: Human-in-the-loop autonomy and learning during deployment. arXiv preprint arXiv:2211.08416, 2022.
- Dexvip: Learning dexterous grasping with human hand pose priors from video. In Conference on Robot Learning, pages 651–661. PMLR, 2022.
- Human-in-the-loop imitation learning using remote teleoperation. arXiv preprint arXiv:2012.06733, 2020.
- What matters in learning from offline human demonstrations for robot manipulation. arXiv preprint arXiv:2108.03298, 2021a.
- What matters in learning from offline human demonstrations for robot manipulation. In 5th Annual Conference on Robot Learning, 2021b. URL https://openreview.net/forum?id=JrsfBJtDFdI.
- Robot hands and the mechanics of manipulation. 1985.
- Interhand2. 6m: A dataset and baseline for 3d interacting hand pose estimation from a single rgb image. In Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part XX 16, pages 548–564. Springer, 2020.
- Contact-invariant optimization for hand manipulation. In Proceedings of the ACM SIGGRAPH/Eurographics symposium on computer animation, pages 137–144, 2012.
- R3m: A universal visual representation for robot manipulation. arXiv preprint arXiv:2203.12601, 2022.
- Probabilistic movement primitives. In C.J. Burges, L. Bottou, M. Welling, Z. Ghahramani, and K.Q. Weinberger, editors, Advances in Neural Information Processing Systems, volume 26. Curran Associates, Inc., 2013. URL https://proceedings.neurips.cc/paper/2013/file/e53a0a2978c28872a4505bdb51db06dc-Paper.pdf.
- Using probabilistic movement primitives in robotics. Autonomous Robots, 42(3):529–551, 2018.
- Reconstructing hands in 3d with transformers. arXiv preprint arXiv:2312.05251, 2023.
- Learning from active human involvement through proxy value propagation. In Thirty-seventh Conference on Neural Information Processing Systems, 2023.
- Dextrous tactile in-hand manipulation using a modular reinforcement learning architecture. arXiv preprint arXiv:2303.04705, 2023.
- Pointnet: Deep learning on point sets for 3d classification and segmentation. arXiv preprint arXiv: Arxiv-1612.00593, 2016.
- In-Hand Object Rotation via Rapid Motor Adaptation. In Conference on Robot Learning (CoRL), 2022.
- From one hand to multiple hands: Imitation learning for dexterous manipulation from single-camera teleoperation. IEEE Robotics and Automation Letters, 7(4):10873–10881, 2022a.
- Dexmv: Imitation learning for dexterous manipulation from human videos. In European Conference on Computer Vision, pages 570–587. Springer, 2022b.
- Dexpoint: Generalizable point cloud reinforcement learning for sim-to-real dexterous manipulation. In Conference on Robot Learning, pages 594–605. PMLR, 2023.
- RelaxedIK: Real-time Synthesis of Accurate and Feasible Robot Arm Motion. In Proceedings of Robotics: Science and Systems, Pittsburgh, Pennsylvania, June 2018. doi: 10.15607/RSS.2018.XIV.043.
- Collisionik: A per-instant pose optimization method for generating robot motions with environment collision avoidance. In 2021 IEEE International Conference on Robotics and Automation (ICRA), pages 9995–10001. IEEE, 2021.
- Articulated hands: Force control and kinematic issues. The International journal of Robotics research, 1(1):4–17, 1982.
- Stefan Schaal. Is imitation learning the route to humanoid robots? Trends in cognitive sciences, 3(6):233–242, 1999.
- Stefan Schaal. Dynamic movement primitives-a framework for motor control in humans and humanoid robotics. In Adaptive motion of animals and machines, pages 261–280. Springer, 2006.
- Dart: Dense articulated real-time tracking. In Robotics: Science and systems, volume 2, pages 1–9. Berkeley, CA, 2014.
- Videodex: Learning dexterity from internet videos. CoRL, 2022.
- LEAP Hand: Low-Cost, Efficient, and Anthropomorphic Hand for Robot Learning. In Proceedings of Robotics: Science and Systems, Daegu, Republic of Korea, July 2023. doi: 10.15607/RSS.2023.XIX.089.
- Robotic telekinesis: Learning a robotic hand imitator by watching humans on youtube. arXiv preprint arXiv:2202.10448, 2022.
- Avid: Learning multi-stage tasks via pixel-level translation of human videos. arXiv preprint arXiv:1912.04443, 2019.
- Deep unsupervised learning using nonequilibrium thermodynamics. In International conference on machine learning, pages 2256–2265. PMLR, 2015.
- Denoising diffusion implicit models. arXiv preprint arXiv:2010.02502, 2020.
- Learning from interventions. In Robotics: Science and Systems (RSS), 2020.
- Grab: A dataset of whole-body human grasping of objects. In Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part IV 16, pages 581–600. Springer, 2020.
- Diffusion inertial poser: Human motion reconstruction from arbitrary sparse imu configurations. arXiv preprint arXiv:2308.16682, 2023.
- Mimicplay: Long-horizon imitation learning by watching human play. arXiv preprint arXiv:2302.12422, 2023a.
- Rangedik: An optimization-based robot motion generation method for ranged-goal tasks. pages 9700–9706, 2023b.
- Probabilistic differentiable filters enable ubiquitous robot control with smartwatches. arXiv preprint arXiv:2309.06606, 2023.
- Masked visual pre-training for motor control. arXiv preprint arXiv:2203.06173, 2022.
- Dexterous manipulation from images: Autonomous real-world rl via substep guidance. In 2023 IEEE International Conference on Robotics and Automation (ICRA), pages 5938–5945. IEEE, 2023a.
- Xskill: Cross embodiment skill discovery. In Conference on Robot Learning, pages 3536–3555. PMLR, 2023b.
- Physical inertial poser (pip): Physics-aware real-time human motion tracking from sparse inertial sensors. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 13167–13178, 2022.
- Rotating without seeing: Towards in-hand dexterity through touch. arXiv preprint arXiv:2303.10880, 2023.
- 3d diffusion policy. arXiv preprint arXiv:2403.03954, 2024.
- Learning fine-grained bimanual manipulation with low-cost hardware. arXiv preprint arXiv:2304.13705, 2023.
- Teleman: Teleoperation for legged robot loco-manipulation using wearable imu-based motion capture. arXiv preprint arXiv:2209.10314, 2022.
- Freihand: A dataset for markerless capture of hand pose and shape from single rgb images. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 813–822, 2019.
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.
Top Community Prompts
Collections
Sign up for free to add this paper to one or more collections.