2000 character limit reached
Zero-Shot Continuous Prompt Transfer: Generalizing Task Semantics Across Language Models (2310.01691v2)
Published 2 Oct 2023 in cs.CL and cs.AI
Abstract: Prompt tuning in NLP has become an increasingly popular method for adapting LLMs to specific tasks. However, the transferability of these prompts, especially continuous prompts, between different models remains a challenge. In this work, we propose a zero-shot continuous prompt transfer method, where source prompts are encoded into relative space and the corresponding target prompts are searched for transferring to target models. Experimental results confirm the effectiveness of our method, showing that 'task semantics' in continuous prompts can be generalized across various LLMs. Moreover, we find that combining 'task semantics' from multiple source models can further enhance the generalizability of transfer.
- A robust self-learning method for fully unsupervised cross-lingual mappings of word embeddings. In Proceedings of the Annual Meeting of the Association for Computational Linguistics, pages 789–798, 2018. URL https://aclanthology.org/P18-1073.
- ATTEMPT: Parameter-efficient multi-task tuning via attentional mixtures of soft prompts. In Proceedings of the Conference on Empirical Methods in Natural Language Processing, pages 6655–6672, 2022. URL https://aclanthology.org/2022.emnlp-main.446.
- Language models are few-shot learners. In Advances in Neural Information Processing Systems, pages 1877–1901, 2020. URL https://proceedings.neurips.cc/paper_files/paper/2020/file/1457c0d6bfcb4967418bfb8ac142f64a-Paper.pdf.
- Decoding word embeddings with brain-based semantic features. Computational Linguistics, 47:663–698, 2021. URL https://doi.org/10.1162/coli_a_00412.
- RLPrompt: Optimizing discrete text prompts with reinforcement learning. In Proceedings of the Conference on Empirical Methods in Natural Language Processing, pages 3369–3391, 2022. URL https://aclanthology.org/2022.emnlp-main.222.
- BERT: Pre-training of deep bidirectional transformers for language understanding. In Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pages 4171–4186, 2019. URL https://aclanthology.org/N19-1423.
- Improving vector space word representations using multilingual correlation. In Proceedings of the Conference of the European Chapter of the Association for Computational Linguistics, pages 462–471, 2014. URL https://aclanthology.org/E14-1049.
- Parameter-efficient transfer learning for NLP. In International Conference on Machine Learning, pages 2790–2799, 2019. URL https://proceedings.mlr.press/v97/houlsby19a/houlsby19a.pdf.
- Prompt waywardness: The curious case of discretized interpretation of continuous prompts. In Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pages 3631–3643, 2022. URL https://aclanthology.org/2022.naacl-main.266.
- Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980, 2014. URL https://arxiv.org/abs/1412.6980.
- Unsupervised machine translation using monolingual corpora only. In International Conference on Learning Representations, 2018. URL https://openreview.net/forum?id=rkYTTf-AZ.
- ALBERT: A lite BERT for self-supervised learning of language representations. In International Conference on Learning Representations, 2020. URL https://openreview.net/forum?id=H1eA7AEtvS.
- Hubness and pollution: Delving into cross-space mapping for zero-shot learning. In Proceedings of the Annual Meeting of the Association for Computational Linguistics and the International Joint Conference on Natural Language Processing, pages 270–280, Beijing, China, 2015. URL https://aclanthology.org/P15-1027.
- The power of scale for parameter-efficient prompt tuning. In Proceedings of the Conference on Empirical Methods in Natural Language Processing, pages 3045–3059, 2021. URL https://aclanthology.org/2021.emnlp-main.243.
- Mapping individual differences across brain network structure to function and behavior with connectome embedding. NeuroImage, 242:118469, 2021. URL https://www.sciencedirect.com/science/article/pii/S1053811921007424.
- Prefix-tuning: Optimizing continuous prompts for generation. In Proceedings of the Annual Meeting of the Association for Computational Linguistics and the International Joint Conference on Natural Language Processing, pages 4582–4597, 2021. URL https://aclanthology.org/2021.acl-long.353.
- RoBERTa: A robustly optimized BERT pretraining approach. arXiv preprint arXiv:1907.11692, 2019. URL https://arxiv.org/abs/1907.11692.
- Relative representations enable zero-shot latent space communication. In International Conference on Learning Representations, 2023. URL https://openreview.net/forum?id=SrC-nwieGJ.
- ASIF: Coupled data turns unimodal models to multimodal without training. arXiv preprint arXiv:2210.01738, 2022. URL https://arxiv.org/pdf/2210.01738.
- Language models as knowledge bases? In Proceedings of the Conference on Empirical Methods in Natural Language Processing and the International Joint Conference on Natural Language Processing, pages 2463–2473, 2019. URL https://aclanthology.org/D19-1250.
- On the expressive power of deep neural networks. In Proceedings of the International Conference on Machine Learning, pages 2847–2854, 2017. URL https://proceedings.mlr.press/v70/raghu17a.html.
- Can discrete information extraction prompts generalize across language models? In International Conference on Learning Representations, 2023. URL https://openreview.net/forum?id=sbWVtxq8-zE.
- AutoPrompt: Eliciting knowledge from language models with automatically generated prompts. In Proceedings of the Conference on Empirical Methods in Natural Language Processing, pages 4222–4235, 2020. URL https://aclanthology.org/2020.emnlp-main.346.
- On transferability of prompt tuning for natural language processing. In Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pages 3949–3969, 2022. URL https://aclanthology.org/2022.naacl-main.290.
- Matus Telgarsky. Benefits of depth in neural networks. In Annual Conference on Learning Theory, pages 1517–1539, 2016. URL https://proceedings.mlr.press/v49/telgarsky16.html.
- SPoT: Better frozen model adaptation through soft prompt transfer. In Proceedings of the Annual Meeting of the Association for Computational Linguistics, pages 5039–5059, 2022. URL https://aclanthology.org/2022.acl-long.346.
- Multitask prompt tuning enables parameter-efficient transfer learning. In International Conference on Learning Representations, 2023. URL https://openreview.net/forum?id=Nk2pDtuhTq.
- Learning to prompt for continual learning. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 139–149, 2022. URL https://openaccess.thecvf.com/content/CVPR2022/papers/Wang_Learning_To_Prompt_for_Continual_Learning_CVPR_2022_paper.pdf.
- Empirical evaluation of rectified activations in convolutional network. arXiv preprint arXiv:1505.00853, 2015. URL https://arxiv.org/abs/1505.00853.
- Factual probing is [MASK]: Learning vs. learning to recall. In Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pages 5017–5033, 2021. URL https://aclanthology.org/2021.naacl-main.398.
- A closer look at how fine-tuning changes BERT. In Proceedings of the Annual Meeting of the Association for Computational Linguistics, pages 1046–1061, 2022. URL https://aclanthology.org/2022.acl-long.75.
- Towards brain-to-text generation: Neural decoding with pre-trained encoder-decoder models. In NeurIPS 2021 AI for Science Workshop, 2021. URL https://openreview.net/forum?id=13IJlk221xG.
- Zijun Wu (19 papers)
- Yongkang Wu (12 papers)
- Lili Mou (79 papers)