Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
80 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Prompt-SAW: Leveraging Relation-Aware Graphs for Textual Prompt Compression (2404.00489v2)

Published 30 Mar 2024 in cs.CL, cs.AI, and cs.LG

Abstract: LLMs have shown exceptional abilities for multiple different natural language processing tasks. While prompting is a crucial tool for LLM inference, we observe that there is a significant cost associated with exceedingly lengthy prompts. Existing attempts to compress lengthy prompts lead to substandard results in terms of readability/interpretability of the compressed prompt, with a detrimental impact on prompt utility. To address this, we propose PromptSAW: Prompt compresSion via Relation AWare graphs, an effective strategy for prompt compression over task-agnostic and task-aware prompts. Prompt-SAW uses the prompt's textual information to build a graph and later extracts key information elements in the graph to come up with the compressed prompt. We also propose GSM8K-aug, i.e., an extended version of the existing GSM8K benchmark for task-agnostic prompts in order to provide a comprehensive evaluation platform. Experimental evaluation using benchmark datasets shows that prompts compressed by Prompt-SAW are not only better in terms of readability, but they also outperform the best-performing baseline models by up to 10.1 and 77.1, respectively, for task-agnostic and task-aware settings while compressing the original prompt text by 34.9 and 56.7.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (46)
  1. Gpt-4 technical report. arXiv preprint arXiv:2303.08774, 2023.
  2. Antonym-synonym classification based on new sub-space embeddings. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 33(01), pp.  6204–6211, 2019.
  3. Fine-grained named entity typing over distantly supervised data based on refined representations. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 34(05), pp.  7391–7398, 2020.
  4. Fine-grained named entity typing over distantly supervised data via refinement in hyperbolic space. arXiv preprint arXiv:2101.11212, 2021.
  5. Gari: Graph attention for relative isomorphism of arabic word embeddings. arXiv preprint arXiv:2310.13068, 2023a.
  6. Gri: Graph-based relative isomorphism of word embedding spaces. arXiv preprint arXiv:2310.12360, 2023b.
  7. Vicuna: An open-source chatbot impressing gpt-4 with 90%* chatgpt quality, March 2023. URL https://lmsys.org/blog/2023-03-30-vicuna/.
  8. Training verifiers to solve math word problems. ArXiv, abs/2110.14168, 2021. URL https://api.semanticscholar.org/CorpusID:239998651.
  9. A survey on in-context learning. arXiv preprint arXiv:2301.00234, 2022.
  10. Calibrating noise to sensitivity in private data analysis. In Theory of Cryptography: Third Theory of Cryptography Conference, TCC 2006, New York, NY, USA, March 4-7, 2006. Proceedings 3, pp.  265–284. Springer, 2006.
  11. High dimensional differentially private stochastic optimization with heavy-tailed data. In Proceedings of the 41st ACM SIGMOD-SIGACT-SIGAI Symposium on Principles of Database Systems, pp.  227–236, 2022.
  12. Differentially private natural language models: Recent advances and future directions. arXiv preprint arXiv:2301.09112, 2023a.
  13. Seat: stable and explainable attention. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 37(11), pp.  12907–12915, 2023b.
  14. Privacy-preserving sparse generalized eigenvalue problem. In International Conference on Artificial Intelligence and Statistics, pp.  5052–5062. PMLR, 2023c.
  15. A survey on knowledge graphs: Representation, acquisition, and applications. IEEE Transactions on Neural Networks and Learning Systems, 33:494–514, 2020. URL https://api.semanticscholar.org/CorpusID:211010433.
  16. Llmlingua: Compressing prompts for accelerated inference of large language models. In Conference on Empirical Methods in Natural Language Processing, 2023a. URL https://api.semanticscholar.org/CorpusID:263830701.
  17. Longllmlingua: Accelerating and enhancing llms in long context scenarios via prompt compression. ArXiv, abs/2310.06839, 2023b. URL https://api.semanticscholar.org/CorpusID:263830692.
  18. Discrete prompt compression with reinforcement learning. ArXiv, abs/2308.08758, 2023. URL https://api.semanticscholar.org/CorpusID:261030884.
  19. Kg-gpt: A general framework for reasoning on knowledge graphs using large language models. ArXiv, abs/2310.11220, 2023. URL https://api.semanticscholar.org/CorpusID:264172465.
  20. Faithful vision-language interpretation via concept bottleneck models. In The Twelfth International Conference on Learning Representations, 2023.
  21. The power of scale for parameter-efficient prompt tuning. In Conference on Empirical Methods in Natural Language Processing, 2021. URL https://api.semanticscholar.org/CorpusID:233296808.
  22. Retrieval-augmented generation for knowledge-intensive nlp tasks. ArXiv, abs/2005.11401, 2020. URL https://api.semanticscholar.org/CorpusID:218869575.
  23. Yucheng Li. Unlocking context constraints of llms: Enhancing context efficiency of llms with self-information-based content filtering. ArXiv, abs/2304.12102, 2023. URL https://api.semanticscholar.org/CorpusID:258298489.
  24. Lost in the middle: How language models use long contexts. Transactions of the Association for Computational Linguistics, 12:157–173, 2023. URL https://api.semanticscholar.org/CorpusID:259360665.
  25. Reasoning on graphs: Faithful and interpretable large language model reasoning. ArXiv, abs/2310.01061, 2023. URL https://api.semanticscholar.org/CorpusID:263605944.
  26. Locating and editing factual associations in gpt. In Neural Information Processing Systems, 2022. URL https://api.semanticscholar.org/CorpusID:255825985.
  27. Harinder Pal and Mausam. Demonyms and compound relational nouns in nominal open ie. In AKBC@NAACL-HLT, 2016. URL https://api.semanticscholar.org/CorpusID:12531907.
  28. Unifying large language models and knowledge graphs: A roadmap. ArXiv, abs/2306.08302, 2023. URL https://api.semanticscholar.org/CorpusID:259165563.
  29. Generative agents: Interactive simulacra of human behavior. Proceedings of the 36th Annual ACM Symposium on User Interface Software and Technology, 2023. URL https://api.semanticscholar.org/CorpusID:258040990.
  30. A systematic survey of prompt engineering in large language models: Techniques and applications. ArXiv, abs/2402.07927, 2024. URL https://api.semanticscholar.org/CorpusID:267636769.
  31. Claude E. Shannon. Prediction and entropy of printed english. Bell System Technical Journal, 30:50–64, 1951. URL https://api.semanticscholar.org/CorpusID:9101213.
  32. Differentially private non-convex learning for multi-layer neural networks. arXiv preprint arXiv:2310.08425, 2023.
  33. Faster rates of private stochastic convex optimization. In International Conference on Algorithmic Learning Theory, pp.  995–1002. PMLR, 2022.
  34. Llama: Open and efficient foundation language models. arXiv preprint arXiv:2302.13971, 2023.
  35. Differentially private (gradient) expectation maximization algorithm with statistical guarantees. arXiv preprint arXiv:2010.13520, 2020.
  36. Estimating smooth glm in non-interactive local differential privacy model with public unlabeled data. In Algorithmic Learning Theory, pp.  1207–1213. PMLR, 2021.
  37. Generalized linear models in non-interactive local differential privacy with public data. Journal of Machine Learning Research, 24(132):1–57, 2023.
  38. Chain of thought prompting elicits reasoning in large language models. ArXiv, abs/2201.11903, 2022. URL https://api.semanticscholar.org/CorpusID:246411621.
  39. Prompt compression and contrastive conditioning for controllability and toxicity reduction in language models. In Conference on Empirical Methods in Natural Language Processing, 2022. URL https://api.semanticscholar.org/CorpusID:252762169.
  40. Practical differentially private and byzantine-resilient federated learning. Proceedings of the ACM on Management of Data, 1(2):1–26, 2023.
  41. How does selection leak privacy: Revisiting private selection and improved results for hyper-parameter tuning. arXiv preprint arXiv:2402.13087, 2024.
  42. An llm can fool itself: A prompt-based adversarial attack. arXiv preprint arXiv:2310.13345, 2023a.
  43. Compress, then prompt: Improving accuracy-efficiency trade-off of llm inference with transferable prompt. ArXiv, abs/2305.11186, 2023b. URL https://api.semanticscholar.org/CorpusID:258823240.
  44. Understanding in-context learning from repetitions. ArXiv, abs/2310.00297, 2023. URL https://api.semanticscholar.org/CorpusID:263334398.
  45. Moral: Moe augmented lora for llms’ lifelong learning. arXiv preprint arXiv:2402.11260, 2024a.
  46. Human-ai interactions in the communication era: Autophagy makes large models achieving local optima. arXiv preprint arXiv:2402.11271, 2024b.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (11)
  1. Muhammad Asif Ali (18 papers)
  2. Zhengping Li (3 papers)
  3. Shu Yang (178 papers)
  4. Keyuan Cheng (9 papers)
  5. Yang Cao (295 papers)
  6. Tianhao Huang (10 papers)
  7. Lijie Hu (50 papers)
  8. Lu Yu (87 papers)
  9. Di Wang (407 papers)
  10. Guimin Hu (11 papers)
  11. Weimin Lyu (19 papers)
Citations (9)