Unifying Demonstration Selection and Compression for In-Context Learning (2405.17062v2)
Abstract: In-context learning (ICL) facilitates LLMs exhibiting spectacular emergent capabilities in various scenarios. Unfortunately, introducing demonstrations easily makes the prompt length explode, bringing a significant burden to hardware. In addition, random demonstrations usually achieve limited improvements in ICL, necessitating demonstration selection among accessible candidates. Previous studies introduce extra modules to perform demonstration compression or selection independently. In this paper, we propose an ICL framework UniICL, which Unifies demonstration selection and compression, and final response generation via a single frozen LLM. Specifically, UniICL first projects actual demonstrations and inference text inputs into short virtual tokens, respectively. Then, virtual tokens are applied to select suitable demonstrations by measuring semantic similarity within latent space among candidate demonstrations and inference input. Finally, inference text inputs together with selected virtual demonstrations are fed into the same frozen LLM for response generation. Notably, UniICL is a parameter-efficient framework that only contains 17M trainable parameters originating from the projection layer. We conduct experiments and analysis over in- and out-domain datasets of both generative and understanding tasks, encompassing ICL scenarios with plentiful and limited demonstration candidates. Results show that UniICL effectively unifies $12 \times$ compression, demonstration selection, and response generation, efficiently scaling up the baseline from 4-shot to 64-shot ICL in IMDb with 24 GB CUDA allocation
- Token merging: Your vit but faster. arXiv preprint arXiv:2210.09461.
- Language models are few-shot learners. Advances in neural information processing systems, 33:1877–1901.
- Adapting language models to compress contexts. arXiv preprint arXiv:2305.14788.
- Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805.
- In-context autoencoder for context compression in a large language model. arXiv preprint arXiv:2307.06945.
- Cicero: A dataset for contextualized commonsense inference in dialogues. arXiv preprint arXiv:2203.13926.
- Momentum contrast for unsupervised visual representation learning. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 9729–9738.
- Llmlingua: Compressing prompts for accelerated inference of large language models. arXiv preprint arXiv:2310.05736.
- Learned token pruning for transformers. In Proceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, pages 784–794.
- Yucheng Li. 2023. Unlocking context constraints of llms: Enhancing context efficiency of llms with self-information-based content filtering. arXiv preprint arXiv:2304.12102.
- Chin-Yew Lin. 2004. Rouge: A package for automatic evaluation of summaries. In Text summarization branches out, pages 74–81.
- What makes good in-context examples for gpt-3333? arXiv preprint arXiv:2101.06804.
- Roberta: A robustly optimized bert pretraining approach. arXiv preprint arXiv:1907.11692.
- Fantastically ordered prompts and where to find them: Overcoming few-shot prompt order sensitivity. arXiv preprint arXiv:2104.08786.
- Learning word vectors for sentiment analysis. In Proceedings of the 49th annual meeting of the association for computational linguistics: Human language technologies, pages 142–150.
- Noisy channel language model prompting for few-shot text classification. In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 5316–5330.
- Learning to compress prompts with gist tokens. arXiv preprint arXiv:2304.08467.
- Don’t give me the details, just the summary! topic-aware convolutional neural networks for extreme summarization. ArXiv, abs/1808.08745.
- Ms marco: A human generated machine reading comprehension dataset. choice, 2640:660.
- In-context retrieval-augmented language models. arXiv preprint arXiv:2302.00083.
- Nils Reimers and Iryna Gurevych. 2019. Sentence-bert: Sentence embeddings using siamese bert-networks. arXiv preprint arXiv:1908.10084.
- Recursive deep models for semantic compositionality over a sentiment treebank. In Proceedings of the 2013 conference on empirical methods in natural language processing, pages 1631–1642.
- Stanford alpaca: An instruction-following llama model.
- BlueLM Team. 2023. Bluelm: An open multilingual 7b language model. https://github.com/vivo-ai-lab/BlueLM.
- Llama: Open and efficient foundation language models. arXiv preprint arXiv:2302.13971.
- Attention is all you need. Advances in neural information processing systems, 30.
- Is chatgpt a good nlg evaluator? a preliminary study. arXiv preprint arXiv:2303.04048.
- Label words are anchors: An information flow perspective for understanding in-context learning. arXiv preprint arXiv:2305.14160.
- Simlm: Pre-training with representation bottleneck for dense passage retrieval. arXiv preprint arXiv:2207.02578.
- Large search model: Redefining search stack in the era of llms. In ACM SIGIR Forum, volume 57, pages 1–16. ACM New York, NY, USA.
- Is chatgpt a good sentiment analyzer? a preliminary study. arXiv preprint arXiv:2304.04339.
- Neural network acceptability judgments. arXiv preprint 1805.12471.
- Zero-shot information extraction via chatting with chatgpt. arXiv preprint arXiv:2302.10205.
- Prompt compression and contrastive conditioning for controllability and toxicity reduction in language models. arXiv preprint arXiv:2210.03162.
- An explanation of in-context learning as implicit bayesian inference. arXiv preprint arXiv:2111.02080.
- Exploring the limits of chatgpt for query or aspect-based text summarization. arXiv preprint arXiv:2302.08081.
- Opt: Open pre-trained transformer language models. arXiv preprint arXiv:2205.01068.
- Judging llm-as-a-judge with mt-bench and chatbot arena. arXiv preprint arXiv:2306.05685.
- Jun Gao (267 papers)
- Ziqiang Cao (34 papers)
- Wenjie Li (183 papers)