Enabling Waypoint Generation for Collaborative Robots using LLMs and Mixed Reality (2403.09308v2)
Abstract: Programming a robotic is a complex task, as it demands the user to have a good command of specific programming languages and awareness of the robot's physical constraints. We propose a framework that simplifies robot deployment by allowing direct communication using natural language. It uses LLMs (LLM) for prompt processing, workspace understanding, and waypoint generation. It also employs Augmented Reality (AR) to provide visual feedback of the planned outcome. We showcase the effectiveness of our framework with a simple pick-and-place task, which we implement on a real robot. Moreover, we present an early concept of expressive robot behavior and skill generation that can be used to communicate with the user and learn new skills (e.g., object grasping).
- G. Ajaykumar, M. Steele, and C.-M. Huang, “A Survey on End-User Robot Programming,” ACM Computing Surveys, vol. 54, pp. 1–36, Nov. 2022.
- K. Kawaharazuka, T. Matsushima, A. Gambardella, J. Guo, C. Paxton, and A. Zeng, “Real-world robot applications of foundation models: A review,” arXiv preprint arXiv:2402.05741, 2024.
- M. Saveriano, S.-i. An, and D. Lee, “Incremental kinesthetic teaching of end-effector and null-space motion primitives,” in 2015 IEEE International Conference on Robotics and Automation (ICRA), (Seattle, WA, USA), pp. 3570–3575, IEEE, May 2015.
- W. Takano and Y. Nakamura, “Real-time Unsupervised Segmentation of human whole-body motion and its application to humanoid robot acquisition of motion symbols,” Robotics and Autonomous Systems, vol. 75, pp. 260–272, Jan. 2016.
- R. Caccavale, M. Saveriano, A. Finzi, and D. Lee, “Kinesthetic teaching and attentional supervision of structured tasks in human–robot interaction,” Autonomous Robots, vol. 43, pp. 1291–1307, Aug. 2019.
- P. K. Kim, J.-H. Bae, H. Park, D.-H. Lee, J.-H. Park, M.-H. Baeg, and J. Park, “Dual-arm robot box taping with kinesthetic teaching,” in 2016 13th International Conference on Ubiquitous Robots and Ambient Intelligence (URAI), (Xian, China), pp. 555–557, IEEE, Aug. 2016.
- D. Kulić, C. Ott, D. Lee, J. Ishikawa, and Y. Nakamura, “Incremental learning of full body motion primitives and their sequencing through human motion observation,” The International Journal of Robotics Research, vol. 31, pp. 330–345, Mar. 2012.
- A. Ude, C. G. Atkeson, and M. Riley, “Programming full-body movements for humanoid robots by observation,” Robotics and Autonomous Systems, vol. 47, pp. 93–108, June 2004.
- A. Skoglund, B. Iliev, and R. Palm, “Programming-by-Demonstration of reaching motions—A next-state-planner approach,” Robotics and Autonomous Systems, vol. 58, pp. 607–621, May 2010.
- R. Hanifi Elhachemi Amar, L. Benchikh, H. Dermeche, O. Bachir, and Z. Ahmed-Foitih, “Trajectory reconstruction for robot programming by demonstration,” International Journal of Electrical and Computer Engineering (IJECE), vol. 10, p. 3066, June 2020.
- B. Maric, M. Polic, T. Tabak, and M. Orsag, “Unsupervised optimization approach to in situ calibration of collaborative human-robot interaction tools,” in 2020 IEEE International Conference on Multisensor Fusion and Integration for Intelligent Systems (MFI), (Karlsruhe, Germany), pp. 255–262, IEEE, Sept. 2020.
- P.-C. Yang, K. Sasaki, K. Suzuki, K. Kase, S. Sugano, and T. Ogata, “Repeatable Folding Task by Humanoid Robot Worker Using Deep Learning,” IEEE Robotics and Automation Letters, vol. 2, pp. 397–403, Apr. 2017.
- K. Kuklinski, K. Fischer, I. Marhenke, F. Kirstein, M. V. Aus Der Wieschen, D. Solvason, N. Kruger, and T. R. Savarimuthu, “Teleoperation for learning by demonstration: Data glove versus object manipulation for intuitive robot control,” in 2014 6th International Congress on Ultra Modern Telecommunications and Control Systems and Workshops (ICUMT), (St. Petersburg), pp. 346–351, IEEE, Oct. 2014.
- S. Kitagawa, S. Hasegawa, N. Yamaguchi, K. Okada, and M. Inaba, “Online tangible robot programming: interactive automation method from teleoperation of manipulation task,” Advanced Robotics, vol. 37, pp. 1063–1081, Aug. 2023.
- Z. Fu, T. Z. Zhao, and C. Finn, “Mobile aloha: Learning bimanual mobile manipulation with low-cost whole-body teleoperation,” arXiv preprint arXiv:2401.02117, 2024.
- J. Norberto Pires, “Robot‐by‐voice: experiments on commanding an industrial robot using the human voice,” Industrial Robot: An International Journal, vol. 32, pp. 505–511, Dec. 2005.
- J. Tasevski, M. Nikolic, and D. Miskovic, “Integration of an industrial robot with the systems for image and voice recognition,” Serbian Journal of Electrical Engineering, vol. 10, no. 1, pp. 219–230, 2013.
- R. Suzuki, A. Karim, T. Xia, H. Hedayati, and N. Marquardt, “Augmented reality and robotics: A survey and taxonomy for ar-enhanced human-robot interaction and robotic interfaces,” in Proceedings of the 2022 CHI Conference on Human Factors in Computing Systems, pp. 1–33, 2022.
- C. P. Quintero, S. Li, M. K. Pan, W. P. Chan, H. Machiel Van Der Loos, and E. Croft, “Robot Programming Through Augmented Trajectories in Augmented Reality,” in 2018 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), (Madrid), pp. 1838–1844, IEEE, Oct. 2018.
- L. L. Gong, S. K. Ong, and A. Y. C. Nee, “Projection-based Augmented Reality Interface for Robot Grasping Tasks,” in Proceedings of the 2019 4th International Conference on Robotics, Control and Automation, (Guangzhou China), pp. 100–104, ACM, July 2019.
- S. Stadler, K. Kain, M. Giuliani, N. Mirnig, G. Stollnberger, and M. Tscheligi, “Augmented reality for industrial robot programmers: Workload analysis for task-based, augmented reality-supported robot control,” in 2016 25th IEEE International Symposium on Robot and Human Interactive Communication (RO-MAN), (New York, NY, USA), pp. 179–184, IEEE, Aug. 2016.
- K. Zielinski, K. Walas, J. Heredia, and M. B. Kjargaard, “A Study of Cobot Practitioners Needs for Augmented Reality Interfaces in the Context of Current Technologies,” in 2021 30th IEEE International Conference on Robot & Human Interactive Communication (RO-MAN), (Vancouver, BC, Canada), pp. 292–298, IEEE, Aug. 2021.
- J. Chen, B. Sun, M. Pollefeys, and H. Blum, “A 3d mixed reality interface for human-robot teaming,” arXiv preprint arXiv:2310.02392, 2023.
- D. Driess, F. Xia, M. S. Sajjadi, C. Lynch, A. Chowdhery, B. Ichter, A. Wahid, J. Tompson, Q. Vuong, T. Yu, et al., “Palm-e: An embodied multimodal language model,” arXiv preprint arXiv:2303.03378, 2023.
- I. Singh, V. Blukis, A. Mousavian, A. Goyal, D. Xu, J. Tremblay, D. Fox, J. Thomason, and A. Garg, “Progprompt: program generation for situated robot task planning using large language models,” Autonomous Robots, vol. 47, no. 8, pp. 999–1012, 2023.
- M. Ahn, A. Brohan, N. Brown, Y. Chebotar, O. Cortes, B. David, C. Finn, C. Fu, K. Gopalakrishnan, K. Hausman, et al., “Do as i can, not as i say: Grounding language in robotic affordances,” arXiv preprint arXiv:2204.01691, 2022.
- M. Ahn, D. Dwibedi, C. Finn, M. G. Arenas, K. Gopalakrishnan, K. Hausman, B. Ichter, A. Irpan, N. Joshi, R. Julian, et al., “Autort: Embodied foundation models for large scale orchestration of robotic agents,” arXiv preprint arXiv:2401.12963, 2024.
- B. Chen, Z. Xu, S. Kirmani, B. Ichter, D. Driess, P. Florence, D. Sadigh, L. Guibas, and F. Xia, “Spatialvlm: Endowing vision-language models with spatial reasoning capabilities,” arXiv preprint arXiv:2401.12168, 2024.
- F. De La Torre, C. M. Fang, H. Huang, A. Banburski-Fahey, J. A. Fernandez, and J. Lanier, “Llmr: Real-time prompting of interactive worlds using large language models,” arXiv preprint arXiv:2309.12276, 2023.
- Unity Technologies, “Unity game engine,” 2024.
- Polycam, “Polycam,” 2024.
- Universal Robot A/S, “Ur10e,” 2024.
- Microsoft, “Scene Understanding,” 2024.
- Microsoft, “Mixed reality mobile remoting,” 2023.
- U. Technologies, “Unity render streaming,” 2019.
- K. Mahadevan, J. Chien, N. Brown, Z. Xu, C. Parada, F. Xia, A. Zeng, L. Takayama, and D. Sadigh, “Generative expressive robot behaviors using large language models,” arXiv preprint arXiv:2401.14673, 2024.
- Universal Robot, “Dancing through the pandemic: How a quantum physicist taught a cobot to dance,” 2024.
- H. Huang, F. De La Torre, C. M. Fang, A. Banburski-Fahey, J. Amores, and J. Lanier, “Real-time animation generation and control on rigged models via large language models,” arXiv preprint arXiv:2310.17838, 2023.
- Microsoft, “Hololens2forcv samples,” 2024.
- Microsoft, “Qrcode tracking overview,” 2024.
- OpenAI, “Vision,” 2024.
- S. Peng, K. Genova, C. M. Jiang, A. Tagliasacchi, M. Pollefeys, and T. Funkhouser, “Openscene: 3d scene understanding with open vocabularies,” 2023.