Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
76 tokens/sec
GPT-4o
13 tokens/sec
Gemini 2.5 Pro Pro
37 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
4 tokens/sec
DeepSeek R1 via Azure Pro
33 tokens/sec
2000 character limit reached

Concise and Precise Context Compression for Tool-Using Language Models (2407.02043v1)

Published 2 Jul 2024 in cs.CL

Abstract: Through reading the documentation in the context, tool-using LLMs can dynamically extend their capability using external tools. The cost is that we have to input lengthy documentation every time the model needs to use the tool, occupying the input window as well as slowing down the decoding process. Given the progress in general-purpose compression, soft context compression is a suitable approach to alleviate the problem. However, when compressing tool documentation, existing methods suffer from the weaknesses of key information loss (specifically, tool/parameter name errors) and difficulty in adjusting the length of compressed sequences based on documentation lengths. To address these problems, we propose two strategies for compressing tool documentation into concise and precise summary sequences for tool-using LLMs. 1) Selective compression strategy mitigates key information loss by deliberately retaining key information as raw text tokens. 2) Block compression strategy involves dividing tool documentation into short chunks and then employing a fixed-length compression model to achieve variable-length compression. This strategy facilitates the flexible adjustment of the compression ratio. Results on API-Bank and APIBench show that our approach reaches a performance comparable to the upper-bound baseline under up to 16x compression ratio.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (29)
  1. Recurrent memory transformer. Advances in Neural Information Processing Systems, 35:11079–11091.
  2. Binding language models in symbolic languages.
  3. Adapting language models to compress contexts. arXiv preprint arXiv:2305.14788.
  4. Vicuna: An open-source chatbot impressing gpt-4 with 90%* chatgpt quality.
  5. Training verifiers to solve math word problems. arXiv preprint arXiv:2110.14168.
  6. Together Computer. 2023. Redpajama: An open source recipe to reproduce llama training dataset.
  7. Pal: Program-aided language models.
  8. In-context autoencoder for context compression in a large language model. arXiv preprint arXiv:2307.06945.
  9. Inner monologue: Embodied reasoning through planning with language models. arXiv preprint arXiv:2207.05608.
  10. Llmlingua: Compressing prompts for accelerated inference of large language models. arXiv preprint arXiv:2310.05736.
  11. Longllmlingua: Accelerating and enhancing llms in long context scenarios via prompt compression. arXiv preprint arXiv:2310.06839.
  12. Internet-augmented dialogue generation. In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 8460–8478, Dublin, Ireland. Association for Computational Linguistics.
  13. Api-bank: A benchmark for tool-augmented llms. arXiv preprint arXiv:2304.08244.
  14. Compressing context to enhance inference efficiency of large language models. arXiv preprint arXiv:2310.06201.
  15. Webgpt: Browser-assisted question-answering with human feedback.
  16. OpenAI. 2023. Chatgpt plugins.
  17. Gorilla: Large language model connected with massive apis. arXiv preprint arXiv:2305.15334.
  18. Toolllm: Facilitating large language models to master 16000+ real-world apis. arXiv preprint arXiv:2307.16789.
  19. Toolformer: Language models can teach themselves to use tools. arXiv preprint arXiv:2302.04761.
  20. SlimPajama: A 627B token cleaned and deduplicated version of RedPajama.
  21. Toolalpaca: Generalized tool learning for language models with 3000 simulated cases. arXiv preprint arXiv:2306.05301.
  22. Stanford alpaca: An instruction-following llama model. https://github.com/tatsu-lab/stanford_alpaca.
  23. Lamda: Language models for dialog applications.
  24. Llama: Open and efficient foundation language models.
  25. Openchat: Advancing open-source language models with mixed-quality data. arXiv preprint arXiv:2309.11235.
  26. Yumeng Wang and Zhenyang Xiao. 2024. Loma: Lossless compressed memory attention. arXiv preprint arXiv:2401.09486.
  27. On the tool manipulation capability of open-source large language models. arXiv preprint arXiv:2305.16504.
  28. Gpt4tools: Teaching large language model to use tools via self-instruction. arXiv preprint arXiv:2305.18752.
  29. React: Synergizing reasoning and acting in language models.
Citations (3)

Summary

We haven't generated a summary for this paper yet.

Dice Question Streamline Icon: https://streamlinehq.com

Follow-up Questions

We haven't generated follow-up questions for this paper yet.