Octopus v2: On-device language model for super agent (2404.01744v5)
Abstract: LLMs have shown effectiveness in a variety of software applications, particularly in tasks related to automatic workflow. These models possess the crucial ability to call functions, which is essential in creating AI agents. Despite the high performance of large-scale LLMs in cloud environments, they are often associated with concerns over privacy and cost. Current on-device models for function calling face issues with latency and accuracy. Our research presents a new method that empowers an on-device model with 2 billion parameters to surpass the performance of GPT-4 in both accuracy and latency, and decrease the context length by 95\%. When compared to Llama-7B with a RAG-based function calling mechanism, our method enhances latency by 35-fold. This method reduces the latency to levels deemed suitable for deployment across a variety of edge devices in production environments, aligning with the performance requisites for real-world applications.
- Gpt-4 technical report. arXiv preprint arXiv:2303.08774, 2023.
- Language models are few-shot learners. Advances in neural information processing systems, 33:1877–1901, 2020.
- Generative ai at work. Technical report, National Bureau of Economic Research, 2023.
- Rich Caruana. Multitask learning. Machine learning, 28:41–75, 1997.
- Imran Chaudhri. Humane ai, 2024. URL https://humane.com/. Accessed on March 31, 2024.
- Qlora: Efficient finetuning of quantized llms. Advances in Neural Information Processing Systems, 36, 2024.
- Towards next-generation intelligent assistants leveraging llm techniques. In Proceedings of the 29th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, pages 5792–5793, 2023.
- Anytool: Self-reflective, hierarchical agents for large-scale api calls. arXiv preprint arXiv:2402.04253, 2024.
- Div Garg. Multion ai, 2024. URL https://www.multion.ai/. Accessed on March 31, 2024.
- Gemma Team, Google DeepMind. Gemma: Open models based on gemini research and technology, 2023. URL https://goo.gle/GemmaReport.
- Adapt and overcome: Perceptions of adaptive autonomous agents for human-ai teaming. Computers in Human Behavior, 138:107451, 2023.
- Metagpt: Meta programming for multi-agent collaborative framework. arXiv preprint arXiv:2308.00352, 2023a.
- 3d-llm: Injecting the 3d world into large language models. Advances in Neural Information Processing Systems, 36:20482–20494, 2023b.
- Lora: Low-rank adaptation of large language models. arXiv preprint arXiv:2106.09685, 2021.
- Active retrieval augmented generation. arXiv preprint arXiv:2305.06983, 2023.
- Design of chain-of-thought in math problem solving, 2023.
- Retrieval-augmented generation for knowledge-intensive nlp tasks. Advances in Neural Information Processing Systems, 33:9459–9474, 2020.
- Mimic-it: Multi-modal in-context instruction tuning. arXiv preprint arXiv:2306.05425, 2023.
- A survey on retrieval-augmented text generation. arXiv preprint arXiv:2202.01110, 2022.
- Taskmatrix. ai: Completing tasks by connecting foundation models with millions of apis. arXiv preprint arXiv:2303.16434, 2023.
- Awq: Activation-aware weight quantization for llm compression and acceleration. arXiv preprint arXiv:2306.00978, 2023.
- Prompt injection attack against llm-integrated applications. arXiv preprint arXiv:2306.05499, 2023.
- Mobilellm: Optimizing sub-billion parameter language models for on-device use cases. arXiv preprint arXiv:2402.14905, 2024.
- llama.cpp team. llama-cpp. Software available at https://github.com/ggerganov/llama.cpp, 2023. Accessed on March 31, 2024.
- David Luan. Adept ai, 2024. URL https://www.adept.ai/. Accessed on March 31, 2024.
- Jesse Lyu. Rabbit r1, 2024. URL https://www.rabbit.tech/. Accessed on March 31, 2024.
- Generation-augmented retrieval for open-domain question answering. arXiv preprint arXiv:2009.08553, 2020.
- Art: Automatic multi-step reasoning and tool-use for large language models. arXiv preprint arXiv:2303.09014, 2023.
- Gorilla: Large language model connected with massive apis. arXiv preprint arXiv:2305.15334, 2023.
- Stable code 3b, 2023. URL https://huggingface.co/stabilityai/stable-code-3b.
- Toolllm: Facilitating large language models to master 16000+ real-world apis. arXiv preprint arXiv:2307.16789, 2023.
- Improving language understanding by generative pre-training. OpenAI blog, 2018.
- Language models are unsupervised multitask learners. OpenAI blog, 1(8):9, 2019.
- In-context retrieval-augmented language models. Transactions of the Association for Computational Linguistics, 11:1316–1331, 2023.
- Tptu: Task planning and tool usage of large language model-based ai agents. arXiv preprint arXiv:2308.03427, 2023.
- Toolformer: Language models can teach themselves to use tools. Advances in Neural Information Processing Systems, 36, 2024.
- Small llms are weak tool learners: A multi-llm agent. arXiv preprint arXiv:2401.07324, 2024a.
- Hugginggpt: Solving ai tasks with chatgpt and its friends in hugging face. Advances in Neural Information Processing Systems, 36, 2024b.
- Reflexion: an autonomous agent with dynamic memory and self-reflection. arXiv preprint arXiv:2303.11366, 2023.
- Accelerating llm inference with staged speculative decoding. arXiv preprint arXiv:2308.04623, 2023.
- Nexusraven: a commercially-permissive language model for function calling. In NeurIPS 2023 Foundation Models for Decision Making Workshop, 2023.
- Multi-agent collaboration: Harnessing the power of intelligent llm agents. arXiv preprint arXiv:2306.03314, 2023.
- Toolalpaca: Generalized tool learning for language models with 3000 simulated cases. arXiv preprint arXiv:2306.05301, 2023.
- Gemini: a family of highly capable multimodal models. arXiv preprint arXiv:2312.11805, 2023.
- MLC team. Mlc-llm. Software available at https://github.com/mlc-ai/mlc-llm, 2023. Accessed on March 31, 2024.
- Llama 2: Open foundation and fine-tuned chat models. arXiv preprint arXiv:2307.09288, 2023.
- Trelis. Llama-2-7b-chat-hf-function-calling-v3: A model fine-tuned for function calling. Hugging Face Model Repository, 2023. URL https://huggingface.co/Trelis/Llama-2-7b-chat-hf-function-calling-v3. Accessed on March 31, 2024.
- Mobile-agent: Autonomous multi-modal mobile device agent with visual perception. arXiv preprint arXiv:2401.16158, 2024.
- A survey on large language model based autonomous agents. arXiv preprint arXiv:2308.11432, 2023.
- Chain-of-thought prompting elicits reasoning in large language models. Advances in Neural Information Processing Systems, 35:24824–24837, 2022.
- Empowering llm to use smartphone for intelligent task automation. arXiv preprint arXiv:2308.15272, 2023.
- Autogen: Enabling next-gen llm applications via multi-agent conversation framework. arXiv preprint arXiv:2308.08155, 2023.
- The rise and potential of large language model based agents: A survey. arXiv preprint arXiv:2309.07864, 2023.
- Llmcad: Fast and scalable on-device large language model inference. arXiv preprint arXiv:2309.04255, 2023.
- Appagent: Multimodal agents as smartphone users. arXiv preprint arXiv:2312.13771, 2023.
- A survey on large language model (llm) security and privacy: The good, the bad, and the ugly. High-Confidence Computing, page 100211, 2024.
- Automatic chain of thought prompting in large language models. arXiv preprint arXiv:2210.03493, 2022.