Prompt-SAW: Leveraging Relation-Aware Graphs for Textual Prompt Compression (2404.00489v2)
Abstract: LLMs have shown exceptional abilities for multiple different natural language processing tasks. While prompting is a crucial tool for LLM inference, we observe that there is a significant cost associated with exceedingly lengthy prompts. Existing attempts to compress lengthy prompts lead to substandard results in terms of readability/interpretability of the compressed prompt, with a detrimental impact on prompt utility. To address this, we propose PromptSAW: Prompt compresSion via Relation AWare graphs, an effective strategy for prompt compression over task-agnostic and task-aware prompts. Prompt-SAW uses the prompt's textual information to build a graph and later extracts key information elements in the graph to come up with the compressed prompt. We also propose GSM8K-aug, i.e., an extended version of the existing GSM8K benchmark for task-agnostic prompts in order to provide a comprehensive evaluation platform. Experimental evaluation using benchmark datasets shows that prompts compressed by Prompt-SAW are not only better in terms of readability, but they also outperform the best-performing baseline models by up to 10.1 and 77.1, respectively, for task-agnostic and task-aware settings while compressing the original prompt text by 34.9 and 56.7.
- Gpt-4 technical report. arXiv preprint arXiv:2303.08774, 2023.
- Antonym-synonym classification based on new sub-space embeddings. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 33(01), pp. 6204–6211, 2019.
- Fine-grained named entity typing over distantly supervised data based on refined representations. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 34(05), pp. 7391–7398, 2020.
- Fine-grained named entity typing over distantly supervised data via refinement in hyperbolic space. arXiv preprint arXiv:2101.11212, 2021.
- Gari: Graph attention for relative isomorphism of arabic word embeddings. arXiv preprint arXiv:2310.13068, 2023a.
- Gri: Graph-based relative isomorphism of word embedding spaces. arXiv preprint arXiv:2310.12360, 2023b.
- Vicuna: An open-source chatbot impressing gpt-4 with 90%* chatgpt quality, March 2023. URL https://lmsys.org/blog/2023-03-30-vicuna/.
- Training verifiers to solve math word problems. ArXiv, abs/2110.14168, 2021. URL https://api.semanticscholar.org/CorpusID:239998651.
- A survey on in-context learning. arXiv preprint arXiv:2301.00234, 2022.
- Calibrating noise to sensitivity in private data analysis. In Theory of Cryptography: Third Theory of Cryptography Conference, TCC 2006, New York, NY, USA, March 4-7, 2006. Proceedings 3, pp. 265–284. Springer, 2006.
- High dimensional differentially private stochastic optimization with heavy-tailed data. In Proceedings of the 41st ACM SIGMOD-SIGACT-SIGAI Symposium on Principles of Database Systems, pp. 227–236, 2022.
- Differentially private natural language models: Recent advances and future directions. arXiv preprint arXiv:2301.09112, 2023a.
- Seat: stable and explainable attention. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 37(11), pp. 12907–12915, 2023b.
- Privacy-preserving sparse generalized eigenvalue problem. In International Conference on Artificial Intelligence and Statistics, pp. 5052–5062. PMLR, 2023c.
- A survey on knowledge graphs: Representation, acquisition, and applications. IEEE Transactions on Neural Networks and Learning Systems, 33:494–514, 2020. URL https://api.semanticscholar.org/CorpusID:211010433.
- Llmlingua: Compressing prompts for accelerated inference of large language models. In Conference on Empirical Methods in Natural Language Processing, 2023a. URL https://api.semanticscholar.org/CorpusID:263830701.
- Longllmlingua: Accelerating and enhancing llms in long context scenarios via prompt compression. ArXiv, abs/2310.06839, 2023b. URL https://api.semanticscholar.org/CorpusID:263830692.
- Discrete prompt compression with reinforcement learning. ArXiv, abs/2308.08758, 2023. URL https://api.semanticscholar.org/CorpusID:261030884.
- Kg-gpt: A general framework for reasoning on knowledge graphs using large language models. ArXiv, abs/2310.11220, 2023. URL https://api.semanticscholar.org/CorpusID:264172465.
- Faithful vision-language interpretation via concept bottleneck models. In The Twelfth International Conference on Learning Representations, 2023.
- The power of scale for parameter-efficient prompt tuning. In Conference on Empirical Methods in Natural Language Processing, 2021. URL https://api.semanticscholar.org/CorpusID:233296808.
- Retrieval-augmented generation for knowledge-intensive nlp tasks. ArXiv, abs/2005.11401, 2020. URL https://api.semanticscholar.org/CorpusID:218869575.
- Yucheng Li. Unlocking context constraints of llms: Enhancing context efficiency of llms with self-information-based content filtering. ArXiv, abs/2304.12102, 2023. URL https://api.semanticscholar.org/CorpusID:258298489.
- Lost in the middle: How language models use long contexts. Transactions of the Association for Computational Linguistics, 12:157–173, 2023. URL https://api.semanticscholar.org/CorpusID:259360665.
- Reasoning on graphs: Faithful and interpretable large language model reasoning. ArXiv, abs/2310.01061, 2023. URL https://api.semanticscholar.org/CorpusID:263605944.
- Locating and editing factual associations in gpt. In Neural Information Processing Systems, 2022. URL https://api.semanticscholar.org/CorpusID:255825985.
- Harinder Pal and Mausam. Demonyms and compound relational nouns in nominal open ie. In AKBC@NAACL-HLT, 2016. URL https://api.semanticscholar.org/CorpusID:12531907.
- Unifying large language models and knowledge graphs: A roadmap. ArXiv, abs/2306.08302, 2023. URL https://api.semanticscholar.org/CorpusID:259165563.
- Generative agents: Interactive simulacra of human behavior. Proceedings of the 36th Annual ACM Symposium on User Interface Software and Technology, 2023. URL https://api.semanticscholar.org/CorpusID:258040990.
- A systematic survey of prompt engineering in large language models: Techniques and applications. ArXiv, abs/2402.07927, 2024. URL https://api.semanticscholar.org/CorpusID:267636769.
- Claude E. Shannon. Prediction and entropy of printed english. Bell System Technical Journal, 30:50–64, 1951. URL https://api.semanticscholar.org/CorpusID:9101213.
- Differentially private non-convex learning for multi-layer neural networks. arXiv preprint arXiv:2310.08425, 2023.
- Faster rates of private stochastic convex optimization. In International Conference on Algorithmic Learning Theory, pp. 995–1002. PMLR, 2022.
- Llama: Open and efficient foundation language models. arXiv preprint arXiv:2302.13971, 2023.
- Differentially private (gradient) expectation maximization algorithm with statistical guarantees. arXiv preprint arXiv:2010.13520, 2020.
- Estimating smooth glm in non-interactive local differential privacy model with public unlabeled data. In Algorithmic Learning Theory, pp. 1207–1213. PMLR, 2021.
- Generalized linear models in non-interactive local differential privacy with public data. Journal of Machine Learning Research, 24(132):1–57, 2023.
- Chain of thought prompting elicits reasoning in large language models. ArXiv, abs/2201.11903, 2022. URL https://api.semanticscholar.org/CorpusID:246411621.
- Prompt compression and contrastive conditioning for controllability and toxicity reduction in language models. In Conference on Empirical Methods in Natural Language Processing, 2022. URL https://api.semanticscholar.org/CorpusID:252762169.
- Practical differentially private and byzantine-resilient federated learning. Proceedings of the ACM on Management of Data, 1(2):1–26, 2023.
- How does selection leak privacy: Revisiting private selection and improved results for hyper-parameter tuning. arXiv preprint arXiv:2402.13087, 2024.
- An llm can fool itself: A prompt-based adversarial attack. arXiv preprint arXiv:2310.13345, 2023a.
- Compress, then prompt: Improving accuracy-efficiency trade-off of llm inference with transferable prompt. ArXiv, abs/2305.11186, 2023b. URL https://api.semanticscholar.org/CorpusID:258823240.
- Understanding in-context learning from repetitions. ArXiv, abs/2310.00297, 2023. URL https://api.semanticscholar.org/CorpusID:263334398.
- Moral: Moe augmented lora for llms’ lifelong learning. arXiv preprint arXiv:2402.11260, 2024a.
- Human-ai interactions in the communication era: Autophagy makes large models achieving local optima. arXiv preprint arXiv:2402.11271, 2024b.
- Muhammad Asif Ali (18 papers)
- Zhengping Li (3 papers)
- Shu Yang (178 papers)
- Keyuan Cheng (9 papers)
- Yang Cao (295 papers)
- Tianhao Huang (10 papers)
- Lijie Hu (50 papers)
- Lu Yu (87 papers)
- Di Wang (407 papers)
- Guimin Hu (11 papers)
- Weimin Lyu (19 papers)