Swarm-GPT: Combining Large Language Models with Safe Motion Planning for Robot Choreography Design

Published 2 Dec 2023 in cs.RO | (2312.01059v1)

Abstract: This paper presents Swarm-GPT, a system that integrates LLMs with safe swarm motion planning - offering an automated and novel approach to deployable drone swarm choreography. Swarm-GPT enables users to automatically generate synchronized drone performances through natural language instructions. With an emphasis on safety and creativity, Swarm-GPT addresses a critical gap in the field of drone choreography by integrating the creative power of generative models with the effectiveness and safety of model-based planning algorithms. This goal is achieved by prompting the LLM to generate a unique set of waypoints based on extracted audio data. A trajectory planner processes these waypoints to guarantee collision-free and feasible motion. Results can be viewed in simulation prior to execution and modified through dynamic re-prompting. Sim-to-real transfer experiments demonstrate Swarm-GPT's ability to accurately replicate simulated drone trajectories, with a mean sim-to-real root mean square error (RMSE) of 28.7 mm. To date, Swarm-GPT has been successfully showcased at three live events, exemplifying safe real-world deployment of pre-trained models.

Abstract PDF Upgrade to Chat

Citations (6)

View on Semantic Scholar

Summary

The paper presents a novel integration of LLMs with safe motion planning to generate synchronized drone swarm choreography from natural language.
It employs a dual approach where an LLM converts user prompts to waypoints and a safety algorithm refines trajectories to avoid collisions.
Simulations and real-time experiments confirm reliable sim-to-real transfer and dynamic re-prompting for interactive drone motion adjustments.

Introduction

The paper introduces Swarm-GPT, an innovative system that paves the way for safer and more creative drone choreography. This system enables users to translate natural language instructions into synchronized and complex motions for a swarm of drones. The integration of an LLM into drone operations is notable for its potential to make programming drone swarms accessible to non-experts.

The context of this work is situated at the intersection of robot choreography and the application of LLMs in robotics, particularly focusing on safe swarm motion planning. Previous studies have centered on automating choreography to some degree, but this often involves manual tuning and expert input. Likewise, LLMs have been used for tasks like visual-language navigation and high-level planning, while safe decision-making within robot swarms has typically relied on model-based planning to mitigate safety risks.

Methodology

Swarm-GPT's methodology stands on two pillars: an LLM interface for generating choreographies based on user prompts and a safe trajectory optimization module to ensure collision-free motion. The LLM interprets high-level language instructions, translating them into waypoints that correspond to the beats of a selected song. This output is then refined by a safety algorithm built on a distributed drone swarm motion planning framework. This framework factors in drone capabilities and environmental constraints to deliver achievable motions.

Simulation and Experimental Evaluation

Evaluation of the system was conducted through simulations and real-time experiments, with impressive results demonstrating reliable sim-to-real transferability of the drone trajectories. The safety filter proved efficacious, eliminating potential collisions in the generated choreographies every time. Additionally, the capability of Swarm-GPT to re-prompt and modify drone behavior based on user input was confirmed, showcasing the system's interactive and adaptive nature.

In conclusion, Swarm-GPT stands as a first-of-its-kind system, integrating the creative power of an LLM with the precision of model-based safety filters to realize safe and interactive drone swarms. Its deployment is not only practical but also showcases the significant potential of combining human-level instructions with robotic precision.