Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
97 tokens/sec
GPT-4o
53 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Octo-planner: On-device Language Model for Planner-Action Agents (2406.18082v1)

Published 26 Jun 2024 in cs.CL and cs.HC

Abstract: AI agents have become increasingly significant in various domains, enabling autonomous decision-making and problem-solving. To function effectively, these agents require a planning process that determines the best course of action and then executes the planned actions. In this paper, we present an efficient on-device Planner-Action framework that separates planning and action execution into two distinct components: a planner agent based on Phi-3 Mini, a 3.8 billion parameter LLM optimized for edge devices, and an action agent using the Octopus model for function execution. The planner agent first responds to user queries by decomposing tasks into a sequence of sub-steps, which are then executed by the action agent. To optimize performance on resource-constrained devices, we employ model fine-tuning instead of in-context learning, reducing computational costs and energy consumption while improving response times. Our approach involves using GPT-4 to generate diverse planning queries and responses based on available functions, with subsequent validations to ensure data quality. We fine-tune the Phi-3 Mini model on this curated dataset, achieving a 97\% success rate in our in-domain test environment. To address multi-domain planning challenges, we developed a multi-LoRA training method that merges weights from LoRAs trained on distinct function subsets. This approach enables flexible handling of complex, multi-domain queries while maintaining computational efficiency on resource-constrained devices. To support further research, we have open-sourced our model weights at \url{https://huggingface.co/NexaAIDev/octopus-planning}. For the demo, please refer to \url{https://www.nexa4ai.com/octo-planner}.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (4)
  1. Wei Chen (1293 papers)
  2. Zhiyuan Li (304 papers)
  3. Zhen Guo (76 papers)
  4. Yikang Shen (62 papers)
Citations (2)

Summary

  • The paper introduces the Octo-planner framework that separates planning and execution to optimize on-device performance.
  • It employs fine-tuning of the Phi-3 Mini model with GPT-4 generated planning data, achieving a 97%-98.1% success rate in task decomposition and execution.
  • The research proposes a multi-LoRA training method to merge diverse function-specific weights, enhancing adaptability for multi-domain queries.

On-Device LLM for Planner-Action Agents: Octo-planner

The paper "Octo-planner: On-device LLM for Planner-Action Agents" introduces a novel framework aimed at enhancing the utility and efficiency of AI agents on resource-constrained devices. The focus of the research is the Octo-planner, an on-device planning model designed to work in tandem with action agents such as the Octopus model, facilitating more efficient planning and execution of tasks in environments where computational resources are limited.

Overview

The Octo-planner framework separates the planning and action execution phases into distinct components—an approach that offers several advantages, including specialization, scalability, and adaptability. The planner first responds to user queries by decomposing tasks into a sequence of sub-steps. These sub-steps are then executed by the Octopus action agent. This modular design is optimized for edge devices, providing a significant improvement in computational efficiency, latency, and energy consumption compared to traditional models that do not separate these phases.

The research highlights an innovative approach to fine-tuning the Phi-3 Mini model, using GPT-4 to generate and validate the planning data required for efficient task decomposition. The dataset is carefully curated to ensure high-quality training data through rigorous validation processes. A salient aspect of the paper is the introduction of the multi-LoRA training method, which merges weights from LoRAs trained on distinct function subsets, thereby enabling the handling of multi-domain queries with maintained computational efficiency.

Key Contributions

  1. Octo-planner Framework: The proposed framework separates planning and action execution for modular optimization, specifically tailored for edge devices.
  2. Fine-tuning Approach: The use of model fine-tuning over in-context learning reduces computational costs and improves response times.
  3. Multi-LoRA Method: This training method allows for the merging of weights from different function-specific LoRAs, enhancing the model's ability to handle complex, multi-domain tasks effectively.
  4. Open-sourcing: The model weights and datasets have been open-sourced to support further research and innovation in on-device AI technologies.

Numerical Results

The Octo-planner demonstrated a 97% success rate in an in-domain testing environment, underscoring the efficacy of the fine-tuning approach on resource-constrained devices. Comparative experiments reveal that full fine-tuning of the Phi-3 Mini model achieves the highest benchmark accuracy of 98.1%, while LoRA training methods, though less accurate, offer significant computational efficiency.

Implications and Future Work

The practical implications of this research are profound, particularly for applications requiring real-time processing, enhanced privacy, and offline functionality. The ability to deploy efficient AI agents on devices such as smartphones opens up numerous potential applications in consumer electronics, healthcare, and more.

Theoretically, the research advances the state-of-the-art in model fine-tuning and adaptive AI systems. The multi-LoRA method in particular represents a significant step towards more flexible and scalable AI models capable of handling diverse and complex tasks.

Future developments could explore iterative planning methodologies that refine plans based on real-time observations, improving adaptability to dynamic environments. Extending the applicability of the Octo-planner to other devices and domains such as IoT and robotics will further enhance its utility and robustness.

Conclusion

The Octo-planner framework represents a substantial contribution to the field of on-device AI agents. By addressing key challenges of efficiency, adaptability, and resource constraints, this research paves the way for more practical, accessible, and cost-effective AI applications. The open-sourcing of the model weights provides the research community with valuable tools to build upon, encouraging further innovation and exploration in the field of on-device AI.

In summary, the separation of planning and execution phases, combined with robust fine-tuning techniques and multi-LoRA adaptability, positions the Octo-planner as a promising solution for deploying advanced AI capabilities on edge devices. The research offers both immediate practical applications and a solid foundation for future advancements in AI technology.