On Conditional and Compositional Language Model Differentiable Prompting (2307.01446v1)
Abstract: Prompts have been shown to be an effective method to adapt a frozen Pretrained LLM (PLM) to perform well on downstream tasks. Prompts can be represented by a human-engineered word sequence or by a learned continuous embedding. In this work, we investigate conditional and compositional differentiable prompting. We propose a new model, Prompt Production System (PRopS), which learns to transform task instructions or input metadata, into continuous prompts that elicit task-specific outputs from the PLM. Our model uses a modular network structure based on our neural formulation of Production Systems, which allows the model to learn discrete rules -- neural functions that learn to specialize in transforming particular prompt input patterns, making it suitable for compositional transfer learning and few-shot learning. We present extensive empirical and theoretical analysis and show that PRopS consistently surpasses other PLM adaptation techniques, and often improves upon fully fine-tuned models, on compositional generalization tasks, controllable summarization and multilingual translation, while needing fewer trainable parameters.
- A framework for learning predictive structures from multiple tasks and unlabeled data. Journal of Machine Learning Research, 6(61):1817–1853, 2005.
- Simple, scalable adaptation for neural machine translation. In Proceedings of the 2019 Conference on EMNLP and the 9th International Joint Conference on Natural Language Processing, pages 1538–1548, Hong Kong, China, November 2019. Association for Computational Linguistics.
- Neural summarization by extracting sentences and words. In Proceedings of the 54th Annual Meeting of the ACL (Volume 1: Long Papers), pages 484–494, Berlin, Germany, August 2016. Association for Computational Linguistics.
- The turking test: Can language models understand instructions? CoRR, abs/2010.11982, 2020.
- Compositional generalization in semantic parsing: Pre-training vs. specialized architectures. ArXiv, abs/2007.08970, 2020.
- Neural production systems. In Advances in Neural Information Processing Systems, 2021.
- Recurrent independent mechanisms. In International Conference on Learning Representations, 2021.
- Parameter-efficient transfer learning with diff pruning. In Proceedings of ACL, 2021.
- PTR: prompt tuning with rules for text classification. CoRR, abs/2105.11259, 2021.
- Parameter-efficient transfer learning for NLP. In Kamalika Chaudhuri and Ruslan Salakhutdinov, editors, Proceedings of the 36th International Conference on Machine Learning, volume 97 of Proceedings of Machine Learning Research, pages 2790–2799. PMLR, 09–15 Jun 2019.
- LoRA: Low-rank adaptation of large language models. arXiv preprint arXiv:2106.09685, 2021.
- How can we know what language models know? Transactions of the Association for Computational Linguistics (ACL), 8:423–438, 2020.
- Measuring compositional generalization: A comprehensive method on realistic data. In International Conference on Learning Representations, 2020.
- Modular networks: Learning to decompose neural computation. In S. Bengio, H. Wallach, H. Larochelle, K. Grauman, N. Cesa-Bianchi, and R. Garnett, editors, Advances in Neural Information Processing Systems, volume 31. Curran Associates, Inc., 2018.
- Philipp Koehn. Statistical significance tests for machine translation evaluation. In Proceedings of the 2004 Conference on Empirical Methods in Natural Language Processing, pages 388–395, Barcelona, Spain, July 2004. Association for Computational Linguistics.
- Philipp Koehn. Europarl: A parallel corpus for statistical machine translation. In Proceedings of Machine Translation Summit X: Papers, pages 79–86, Phuket, Thailand, September 13-15 2005.
- Generalization without systematicity: On the compositional skills of sequence-to-sequence recurrent networks. In ICML, 2018.
- The power of scale for parameter-efficient prompt tuning. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, pages 3045–3059, Online and Punta Cana, Dominican Republic, November 2021. Association for Computational Linguistics.
- BART: Denoising sequence-to-sequence pre-training for natural language generation, translation, and comprehension. In Proceedings of the 58th Annual Meeting of the ACL, pages 7871–7880, Online, July 2020. Association for Computational Linguistics.
- Prefix-tuning: Optimizing continuous prompts for generation. In Proceedings of the 59th Annual Meeting of the ACL and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), pages 4582–4597, Online, August 2021. Association for Computational Linguistics.
- Learning to transfer prompts for text generation. In Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pages 3506–3518, Seattle, United States, July 2022. Association for Computational Linguistics.
- Exploring versatile generative language model via parameter-efficient transfer learning. In Findings of EMNLP 2020, pages 441–459, Online, November 2020. Association for Computational Linguistics.
- Multilingual denoising pre-training for neural machine translation. Transactions of the Association for Computational Linguistics, 8:726–742, 2020.
- GPT understands, too. CoRR, abs/2103.10385, 2021.
- Natural instructions: Benchmarking generalization to new tasks from natural language instructions. CoRR, abs/2104.08773, 2021.
- Rewards with negative examples for reinforced topic-focused abstractive summarization. In Proceedings of the Third Workshop on New Frontiers in Summarization, pages 33–38, Online and in Dominican Republic, November 2021. Association for Computational Linguistics.
- Don’t give me the details, just the summary! Topic-aware convolutional neural networks for extreme summarization. In Proceedings of EMNLP, 2018.
- Allen. Newell. Human problem solving. Prentice-Hall, Englewood Cliffs, N. J, 1972.
- Nils J. Nilsson. The Quest for Artificial Intelligence: A History of Ideas and Achievements. Cambridge University Press, Cambridge, UK, 2010.
- The E2E dataset: New challenges for end-to-end generation. In Proceedings of the 18th Annual SIGdial Meeting on Discourse and Dialogue, pages 201–206, Saarbrücken, Germany, August 2017.
- Continual learning via local module composition. In Advances in Neural Information Processing Systems, 2021.
- Language models as knowledge bases? In Proceedings of the 2019 Conference on EMNLP and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), pages 2463–2473, Hong Kong, China, November 2019. Association for Computational Linguistics.
- On extractive and abstractive neural document summarization with transformer language models. In Proceedings of the 2020 Conference on EMNLP, pages 9308–9319, Online, November 2020. Association for Computational Linguistics.
- Conditionally adaptive multi-task learning: Improving transfer learning in nlp using fewer parameters & less data. In International Conference on Learning Representations, 2021.
- Combining parameter-efficient modules for task-level generalisation. In Proceedings of the 17th Conference of the European Chapter of the Association for Computational Linguistics, pages 687–702, Dubrovnik, Croatia, May 2023. Association for Computational Linguistics.
- Exploring the limits of transfer learning with a unified text-to-text transformer. Journal of Machine Learning Research, 21(140):1–67, 2020.
- Exploiting cloze-questions for few-shot text classification and natural language inference. In Proceedings of the 16th Conference of the European Chapter of the ACL: Main Volume, pages 255–269, Online, April 2021. Association for Computational Linguistics.
- A distributed connectionist production system. Cognitive Science, 12(3):423–466, 1988.
- NewsQA: A machine comprehension dataset. In Proceedings of the 2nd Workshop on Representation Learning for NLP, pages 191–200, Vancouver, Canada, August 2017. Association for Computational Linguistics.
- Multimodal few-shot learning with frozen language models. CoRR, abs/2106.13884, 2021.
- Attention is all you need. In I. Guyon, U. V. Luxburg, S. Bengio, H. Wallach, R. Fergus, S. Vishwanathan, and R. Garnett, editors, Advances in Neural Information Processing Systems, volume 30. Curran Associates, Inc., 2017.
- C.G. Wagner. A First Course in Enumerative Combinatorics. Pure and Applied Undergraduate Texts. American Mathematical Society, 2020.
- GLUE: A multi-task benchmark and analysis platform for natural language understanding. In Proceedings of the 2018 EMNLP Workshop BlackboxNLP: Analyzing and Interpreting Neural Networks for NLP, pages 353–355, Brussels, Belgium, November 2018. Association for Computational Linguistics.
- Attentive pooling with learnable norms for text representation. In Proceedings of the 58th Annual Meeting of the ACL, pages 2961–2970, Online, July 2020. Association for Computational Linguistics.