Parameter Efficient Diverse Paraphrase Generation Using Sequence-Level Knowledge Distillation (2404.12596v1)
Abstract: Over the past year, the field of Natural Language Generation (NLG) has experienced an exponential surge, largely due to the introduction of LLMs. These models have exhibited the most effective performance in a range of domains within the Natural Language Processing and Generation domains. However, their application in domain-specific tasks, such as paraphrasing, presents significant challenges. The extensive number of parameters makes them difficult to operate on commercial hardware, and they require substantial time for inference, leading to high costs in a production setting. In this study, we tackle these obstacles by employing LLMs to develop three distinct models for the paraphrasing field, applying a method referred to as sequence-level knowledge distillation. These distilled models are capable of maintaining the quality of paraphrases generated by the LLM. They demonstrate faster inference times and the ability to generate diverse paraphrases of comparable quality. A notable characteristic of these models is their ability to exhibit syntactic diversity while also preserving lexical diversity, features previously uncommon due to existing data quality issues in datasets and not typically observed in neural-based approaches. Human evaluation of our models shows that there is only a 4% drop in performance compared to the LLM teacher model used in the distillation process, despite being 1000 times smaller. This research provides a significant contribution to the NLG field, offering a more efficient and cost-effective solution for paraphrasing tasks.
- K. R. McKeown, “Paraphrasing using given and new information in a question-answer system,” in Proceedings of the 17th annual meeting on Association for Computational Linguistics -. La Jolla, California: Association for Computational Linguistics, 1979, p. 67. [Online]. Available: http://portal.acm.org/citation.cfm?doid=982163.982182
- M. Meteer and V. Shaked, “Strategies for effective paraphrasing,” in Proceedings of the 12th conference on Computational linguistics -, vol. 2. Budapest, Hungry: Association for Computational Linguistics, 1988, pp. 431–436. [Online]. Available: http://portal.acm.org/citation.cfm?doid=991719.991724
- R. Kozlowski, K. F. McCoy, and K. Vijay-Shanker, “Generation of single-sentence paraphrases from predicate/argument structure using lexico-grammatical resources,” in Proceedings of the second international workshop on Paraphrasing -, vol. 16. Sapporo, Japan: Association for Computational Linguistics, 2003, pp. 1–8. [Online]. Available: http://portal.acm.org/citation.cfm?doid=1118984.1118985
- D. Lin and P. Pantel, “Discovery of inference rules for question-answering,” Natural Language Engineering, vol. 7, no. 4, pp. 343–360, Dec. 2001. [Online]. Available: https://www.cambridge.org/core/product/identifier/S1351324901002765/type/journal_article
- D. Kauchak and R. Barzilay, “Paraphrasing for Automatic Evaluation,” in Proceedings of the Human Language Technology Conference of the NAACL, Main Conference. New York City, USA: Association for Computational Linguistics, Jun. 2006, pp. 455–462. [Online]. Available: https://aclanthology.org/N06-1058
- S. Wubben, A. van den Bosch, and E. Krahmer, “Paraphrase Generation as Monolingual Translation: Data and Evaluation,” in Proceedings of the 6th International Natural Language Generation Conference. Association for Computational Linguistics, Jul. 2010. [Online]. Available: https://aclanthology.org/W10-4223
- W. X. Zhao, K. Zhou, J. Li, T. Tang, X. Wang, Y. Hou, Y. Min, B. Zhang, J. Zhang, Z. Dong, Y. Du, C. Yang, Y. Chen, Z. Chen, J. Jiang, R. Ren, Y. Li, X. Tang, Z. Liu, P. Liu, J.-Y. Nie, and J.-R. Wen, “A Survey of Large Language Models,” Sep. 2023, arXiv:2303.18223 [cs] version: 12. [Online]. Available: http://arxiv.org/abs/2303.18223
- Z. Li, Z. Yang, and M. Wang, “Reinforcement Learning with Human Feedback: Learning Dynamic Choices via Pessimism,” 2023. [Online]. Available: https://arxiv.org/abs/2305.18438
- J. Zhou and S. Bhat, “Paraphrase Generation: A Survey of the State of the Art,” in Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, M.-F. Moens, X. Huang, L. Specia, and S. W.-t. Yih, Eds. Online and Punta Cana, Dominican Republic: Association for Computational Linguistics, Nov. 2021, pp. 5075–5086. [Online]. Available: https://aclanthology.org/2021.emnlp-main.414
- J. Ganitkevitch, B. Van Durme, and C. Callison-Burch, “PPDB: The Paraphrase Database,” in Proceedings of the 2013 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. Atlanta, Georgia: Association for Computational Linguistics, Jun. 2013, pp. 758–764. [Online]. Available: https://aclanthology.org/N13-1092
- W. Lan, S. Qiu, H. He, and W. Xu, “A Continuously Growing Dataset of Sentential Paraphrases,” in Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing. Copenhagen, Denmark: Association for Computational Linguistics, 2017, pp. 1224–1234. [Online]. Available: http://aclweb.org/anthology/D17-1126
- A. Fader, L. Zettlemoyer, and O. Etzioni, “Paraphrase-Driven Learning for Open Question Answering,” in Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). Sofia, Bulgaria: Association for Computational Linguistics, Aug. 2013, pp. 1608–1618. [Online]. Available: https://aclanthology.org/P13-1158
- T.-Y. Lin, M. Maire, S. Belongie, J. Hays, P. Perona, D. Ramanan, P. Dollár, and C. L. Zitnick, “Microsoft COCO: Common Objects in Context,” in Computer Vision – ECCV 2014, D. Fleet, T. Pajdla, B. Schiele, and T. Tuytelaars, Eds. Cham: Springer International Publishing, 2014, vol. 8693, pp. 740–755, series Title: Lecture Notes in Computer Science. [Online]. Available: http://link.springer.com/10.1007/978-3-319-10602-1_48
- W. B. Dolan and C. Brockett, “Automatically Constructing a Corpus of Sentential Paraphrases,” in Proceedings of the Third International Workshop on Paraphrasing (IWP2005), 2005. [Online]. Available: https://aclanthology.org/I05-5002
- S. Iyer, N. Dandeka, and K. Csernai, “First Quora Dataset Release: Question Pairs,” 2017. [Online]. Available: https://quoradata.quora.com/First-Quora-Dataset-Release-Question-Pairs
- J. Wieting and K. Gimpel, “ParaNMT-50M: Pushing the Limits of Paraphrastic Sentence Embeddings with Millions of Machine Translations,” in Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). Melbourne, Australia: Association for Computational Linguistics, 2018, pp. 451–462. [Online]. Available: http://aclweb.org/anthology/P18-1042
- J. E. Hu, R. Rudinger, M. Post, and B. Van Durme, “ParaBank: Monolingual Bitext Generation and Sentential Paraphrasing via Lexically-constrained Neural Machine Translation,” Jan. 2019, arXiv:1901.03644 [cs]. [Online]. Available: http://arxiv.org/abs/1901.03644
- J. E. Hu, A. Singh, N. Holzenberger, M. Post, and B. Van Durme, “Large-Scale, Diverse, Paraphrastic Bitexts via Sampling and Clustering,” in Proceedings of the 23rd Conference on Computational Natural Language Learning (CoNLL). Hong Kong, China: Association for Computational Linguistics, 2019, pp. 44–54. [Online]. Available: https://www.aclweb.org/anthology/K19-1005
- Y. Zhang, J. Baldridge, and L. He, “PAWS: Paraphrase Adversaries from Word Scrambling,” in Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers). Minneapolis, Minnesota: Association for Computational Linguistics, Jun. 2019, pp. 1298–1308. [Online]. Available: https://aclanthology.org/N19-1131
- Z. Lin and X. Wan, “Pushing Paraphrase Away from Original Sentence: A Multi-Round Paraphrase Generation Approach,” 2021. [Online]. Available: https://arxiv.org/abs/2109.01862
- M. Liu, E. Yang, D. Xiong, Y. Zhang, Y. Meng, C. Hu, J. Xu, and Y. Chen, “A Learning-Exploring Method to Generate Diverse Paraphrases with Multi-Objective Deep Reinforcement Learning,” in Proceedings of the 28th International Conference on Computational Linguistics, D. Scott, N. Bel, and C. Zong, Eds. Barcelona, Spain (Online): International Committee on Computational Linguistics, Dec. 2020, pp. 2310–2321. [Online]. Available: https://aclanthology.org/2020.coling-main.209
- J. R. Chowdhury, Y. Zhuang, and S. Wang, “Novelty Controlled Paraphrase Generation with Retrieval Augmented Conditional Prompt Tuning,” 2022. [Online]. Available: https://arxiv.org/abs/2202.00535
- Y. Cao and X. Wan, “DivGAN: Towards Diverse Paraphrase Generation via Diversified Generative Adversarial Network,” in Findings of the Association for Computational Linguistics: EMNLP 2020, T. Cohn, Y. He, and Y. Liu, Eds. Online: Association for Computational Linguistics, Nov. 2020, pp. 2411–2421. [Online]. Available: https://aclanthology.org/2020.findings-emnlp.218
- T. Goyal and G. Durrett, “Neural Syntactic Preordering for Controlled Paraphrase Generation,” in Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, D. Jurafsky, J. Chai, N. Schluter, and J. Tetreault, Eds. Online: Association for Computational Linguistics, Jul. 2020, pp. 238–252. [Online]. Available: https://aclanthology.org/2020.acl-main.22
- I. J. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley, S. Ozair, A. Courville, and Y. Bengio, “Generative Adversarial Networks,” 2014. [Online]. Available: https://arxiv.org/abs/1406.2661
- L. Yu, W. Zhang, J. Wang, and Y. Yu, “SeqGAN: Sequence Generative Adversarial Nets with Policy Gradient,” Aug. 2017, arXiv:1609.05473 [cs]. [Online]. Available: http://arxiv.org/abs/1609.05473
- Z. Lin, Z. Li, N. Ding, H.-T. Zheng, Y. Shen, W. Wang, and C.-Z. Zhao, “Integrating Linguistic Knowledge to Sentence Paraphrase Generation,” Proceedings of the AAAI Conference on Artificial Intelligence, vol. 34, no. 05, pp. 8368–8375, Apr. 2020. [Online]. Available: https://ojs.aaai.org/index.php/AAAI/article/view/6354
- Y. Fu, Y. Feng, and J. P. Cunningham, “Paraphrase Generation with Latent Bag of Words,” Jan. 2020, arXiv:2001.01941 [cs]. [Online]. Available: http://arxiv.org/abs/2001.01941
- A. Kumar, K. Ahuja, R. Vadapalli, and P. Talukdar, “Syntax-Guided Controlled Generation of Paraphrases,” Transactions of the Association for Computational Linguistics, vol. 8, pp. 329–345, 2020. [Online]. Available: https://aclanthology.org/2020.tacl-1.22
- W. Chen, J. Tian, L. Xiao, H. He, and Y. Jin, “A Semantically Consistent and Syntactically Variational Encoder-Decoder Framework for Paraphrase Generation,” in Proceedings of the 28th International Conference on Computational Linguistics, D. Scott, N. Bel, and C. Zong, Eds. Barcelona, Spain (Online): International Committee on Computational Linguistics, Dec. 2020, pp. 1186–1198. [Online]. Available: https://aclanthology.org/2020.coling-main.102
- A. Kazemnejad, M. Salehi, and M. Soleymani Baghshah, “Paraphrase Generation by Learning How to Edit from Samples,” in Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, D. Jurafsky, J. Chai, N. Schluter, and J. Tetreault, Eds. Online: Association for Computational Linguistics, Jul. 2020, pp. 6010–6021. [Online]. Available: https://aclanthology.org/2020.acl-main.535
- G. Hinton, O. Vinyals, and J. Dean, “Distilling the Knowledge in a Neural Network,” 2015. [Online]. Available: https://arxiv.org/abs/1503.02531
- X. Jiao, Y. Yin, L. Shang, X. Jiang, X. Chen, L. Li, F. Wang, and Q. Liu, “TinyBERT: Distilling BERT for Natural Language Understanding,” in Findings of the Association for Computational Linguistics: EMNLP 2020, T. Cohn, Y. He, and Y. Liu, Eds. Online: Association for Computational Linguistics, Nov. 2020, pp. 4163–4174. [Online]. Available: https://aclanthology.org/2020.findings-emnlp.372
- Y. Kim and A. M. Rush, “Sequence-Level Knowledge Distillation,” in Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing, J. Su, K. Duh, and X. Carreras, Eds. Austin, Texas: Association for Computational Linguistics, Nov. 2016, pp. 1317–1327. [Online]. Available: https://aclanthology.org/D16-1139
- N. Bogoychev, R. Grundkiewicz, A. F. Aji, M. Behnke, K. Heafield, S. Kashyap, E.-I. Farsarakis, and M. Chudyk, “Edinburgh’s Submissions to the 2020 Machine Translation Efficiency Task,” in Proceedings of the Fourth Workshop on Neural Generation and Translation, A. Birch, A. Finch, H. Hayashi, K. Heafield, M. Junczys-Dowmunt, I. Konstas, X. Li, G. Neubig, and Y. Oda, Eds. Online: Association for Computational Linguistics, Jul. 2020, pp. 218–224. [Online]. Available: https://aclanthology.org/2020.ngt-1.26
- M. Wu, A. Waheed, C. Zhang, M. Abdul-Mageed, and A. F. Aji, “LaMini-LM: A Diverse Herd of Distilled Models from Large-Scale Instructions,” May 2023, arXiv:2304.14402 [cs]. [Online]. Available: http://arxiv.org/abs/2304.14402
- Y. Gu, L. Dong, F. Wei, and M. Huang, “Knowledge Distillation of Large Language Models,” Jun. 2023, arXiv:2306.08543 [cs]. [Online]. Available: http://arxiv.org/abs/2306.08543
- C. Raffel, N. Shazeer, A. Roberts, K. Lee, S. Narang, M. Matena, Y. Zhou, W. Li, and P. J. Liu, “Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer,” 2019. [Online]. Available: https://arxiv.org/abs/1910.10683
- H. W. Chung, L. Hou, S. Longpre, B. Zoph, Y. Tay, W. Fedus, Y. Li, X. Wang, M. Dehghani, S. Brahma, A. Webson, S. S. Gu, Z. Dai, M. Suzgun, X. Chen, A. Chowdhery, A. Castro-Ros, M. Pellat, K. Robinson, D. Valter, S. Narang, G. Mishra, A. Yu, V. Zhao, Y. Huang, A. Dai, H. Yu, S. Petrov, E. H. Chi, J. Dean, J. Devlin, A. Roberts, D. Zhou, Q. V. Le, and J. Wei, “Scaling Instruction-Finetuned Language Models,” 2022. [Online]. Available: https://arxiv.org/abs/2210.11416
- M. Lewis, Y. Liu, N. Goyal, M. Ghazvininejad, A. Mohamed, O. Levy, V. Stoyanov, and L. Zettlemoyer, “BART: Denoising Sequence-to-Sequence Pre-training for Natural Language Generation, Translation, and Comprehension,” 2019. [Online]. Available: https://arxiv.org/abs/1910.13461
- E. J. Hu, Y. Shen, P. Wallis, Z. Allen-Zhu, Y. Li, S. Wang, L. Wang, and W. Chen, “LoRA: Low-Rank Adaptation of Large Language Models,” 2021. [Online]. Available: https://arxiv.org/abs/2106.09685
- OpenAI, “New and Improved Embedding Model,” 2023, publisher: OpenAI. [Online]. Available: https://openai.com/blog/new-and-improved-embedding-model
- T. Gao, X. Yao, and D. Chen, “SimCSE: Simple Contrastive Learning of Sentence Embeddings,” in Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing. Online and Punta Cana, Dominican Republic: Association for Computational Linguistics, Nov. 2021, pp. 6894–6910. [Online]. Available: https://aclanthology.org/2021.emnlp-main.552
- Y. Jiang, L. Zhang, and W. Wang, “Improved Universal Sentence Embeddings with Prompt-based Contrastive Learning and Energy-based Learning,” in Findings of the Association for Computational Linguistics: EMNLP 2022. Abu Dhabi, United Arab Emirates: Association for Computational Linguistics, Dec. 2022, pp. 3021–3035. [Online]. Available: https://aclanthology.org/2022.findings-emnlp.220
- N. Reimers and I. Gurevych, “Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks,” in Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics, Nov. 2019. [Online]. Available: http://arxiv.org/abs/1908.10084
- P. Qi, Y. Zhang, Y. Zhang, J. Bolton, and C. D. Manning, “Stanza: A Python Natural Language Processing Toolkit for Many Human Languages,” in Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics: System Demonstrations, 2020.
- M. Pawlik and N. Augsten, “Efficient Computation of the Tree Edit Distance,” ACM Transactions on Database Systems, vol. 40, no. 1, pp. 1–40, Mar. 2015. [Online]. Available: https://dl.acm.org/doi/10.1145/2699485
- F. M. Zanzotto, A. Santilli, L. Ranaldi, D. Onorati, P. Tommasino, and F. Fallucchi, “KERMIT: Complementing Transformer Architectures with Encoders of Explicit Syntactic Interpretations,” in Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP). Online: Association for Computational Linguistics, Nov. 2020, pp. 256–267. [Online]. Available: https://www.aclweb.org/anthology/2020.emnlp-main.18
- M. Post, “A Call for Clarity in Reporting BLEU Scores,” in Proceedings of the Third Conference on Machine Translation: Research Papers. Belgium, Brussels: Association for Computational Linguistics, Oct. 2018, pp. 186–191. [Online]. Available: https://www.aclweb.org/anthology/W18-6319
- C. Van Der Lee, A. Gatt, E. Van Miltenburg, S. Wubben, and E. Krahmer, “Best practices for the human evaluation of automatically generated text,” in Proceedings of the 12th International Conference on Natural Language Generation. Tokyo, Japan: Association for Computational Linguistics, 2019, pp. 355–368. [Online]. Available: https://www.aclweb.org/anthology/W19-8643
- Y. Liu, D. Iter, Y. Xu, S. Wang, R. Xu, and C. Zhu, “G-Eval: NLG Evaluation using GPT-4 with Better Human Alignment,” 2023, publisher: arXiv Version Number: 3. [Online]. Available: https://arxiv.org/abs/2303.16634
- OpenAI, “GPT-4 Technical Report,” 2023, publisher: arXiv Version Number: 3. [Online]. Available: https://arxiv.org/abs/2303.08774