TypeFly: Flying Drones with Large Language Model (2312.14950v2)
Abstract: Recent advancements in robot control using LLMs have demonstrated significant potential, primarily due to LLMs' capabilities to understand natural language commands and generate executable plans in various languages. However, in real-time and interactive applications involving mobile robots, particularly drones, the sequential token generation process inherent to LLMs introduces substantial latency, i.e. response time, in control plan generation. In this paper, we present a system called ChatFly that tackles this problem using a combination of a novel programming language called MiniSpec and its runtime to reduce the plan generation time and drone response time. That is, instead of asking an LLM to write a program (robotic plan) in the popular but verbose Python, ChatFly gets it to do it in MiniSpec specially designed for token efficiency and stream interpretation. Using a set of challenging drone tasks, we show that design choices made by ChatFly can reduce up to 62% response time and provide a more consistent user experience, enabling responsive and intelligent LLM-based drone control with efficient completion.
- 2023. TypeFly. https://typefly.github.io/
- Do As I Can, Not As I Say: Grounding Language in Robotic Affordances. In Conference on Robot Learning.
- Evaluating Large Language Models Trained on Code. In arXiv:2107.03374.
- PaLM-E: An Embodied Multimodal Language Model. In International Conference on Machine Learning.
- Google. 2023. gRPC. https://github.com/grpc/grpc
- George Hotz and Lex Fridman. 2023. George Hotz: Tiny Corp, Twitter, AI Safety, Self-Driving, GPT, AGI & God — Lex Fridman Podcast. https://www.youtube.com/watch?v=dNrTrx42DGQ
- Instruct2Act: Mapping Multi-modality Instructions to Robotic Actions with Large Language Model. In arXiv:2305.11176.
- Language Models as Zero-Shot Planners: Extracting Actionable Knowledge for Embodied Agents. In International Conference on Machine Learning.
- Grounded decoding: Guiding text generation with grounded models for robot control. In arXiv:2303.00855.
- VIMA: General Robot Manipulation with Multimodal Prompts. In arXiv:2210.03094.
- YOLO by Ultralytics. https://github.com/ultralytics/ultralytics
- Large Language Models are Zero-Shot Reasoners. In Advances in neural information processing systems.
- Reducto: On-camera filtering for resource-efficient real-time video analytics. In Annual conference of the ACM Special Interest Group on Data Communication on the applications, technologies, architectures, and protocols for computer communication.
- Code as policies: Language model programs for embodied control. In IEEE International Conference on Robotics and Automation.
- MoodScope: Building a Mood Sensor from Smartphone Usage Patterns. In Proc. ACM MobiSys.
- Text2Motion: From Natural Language Instructions to Feasible Plans. In arXiv:2303.12153.
- Improved Baselines with Visual Instruction Tuning. In arXiv:2310.03744.
- Deja Vu: Contextual Sparsity for Efficient LLMs at Inference Time. In International Conference on Machine Learning.
- Large Language Models as General Pattern Machines. In Conference on Robot Learning.
- OpenAI. 2023. tiktoken. https://https://github.com/openai/tiktoken
- AI-assisted coding: Experiments with GPT-4. In arXiv:2304.13187.
- A Generalist Agent. In Transactions on Machine Learning Research.
- Code Llama: Open Foundation Models for Code. In arXiv:2308.12950.
- Progprompt: Generating situated robot task plans using large language models. In IEEE International Conference on Robotics and Automation.
- LLM-Planner: Few-Shot Grounded Planning for Embodied Agents with Large Language Models. In IEEE/CVF International Conference on Computer Vision.
- From Words to Flight: Integrating OpenAI ChatGPT with PX4/Gazebo for Natural Language-Based Drone Control. In International Workshop on Computer Science and Engineering.
- Verigen: A large language model for verilog code generation. In arXiv:2308.00708.
- ChatGPT for Robotics: Design Principles and Model Abilities. In arXiv:2306.17582.
- Towards Understanding Chain-of-Thought Prompting: An Empirical Study of What Matters. In Annual Meeting of the Association for Computational Linguistics.
- TidyBot: Personalized Robot Assistance with Large Language Models. In Autonomous Robots.
- EdgeFM: Leveraging Foundation Model for Open-set Learning on the Edge. In ACM Conference on Embedded Networked Sensor Systems.
- Chat with the Environment: Interactive Multimodal Perception Using Large Language Models. In IEEE/RSJ International Conference on Intelligent Robots and Systems.
- Language Agent Tree Search Unifies Reasoning Acting and Planning in Language Models. In arXiv:2310.04406.
- Efficiently Measuring the Cognitive Ability of LLMs: An Adaptive Testing Perspective. In arXiv:2306.10512.
- RT-2: Vision-Language-Action Models Transfer Web Knowledge to Robotic Control. In Conference on Robot Learning.
Sponsor
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.
Top Community Prompts
Collections
Sign up for free to add this paper to one or more collections.