Towards Human Understanding of Paraphrase Types in ChatGPT (2407.02302v1)
Abstract: Paraphrases represent a human's intuitive ability to understand expressions presented in various different ways. Current paraphrase evaluations of LLMs primarily use binary approaches, offering limited interpretability of specific text changes. Atomic paraphrase types (APT) decompose paraphrases into different linguistic changes and offer a granular view of the flexibility in linguistic expression (e.g., a shift in syntax or vocabulary used). In this study, we assess the human preferences towards ChatGPT in generating English paraphrases with ten APTs and five prompting techniques. We introduce APTY (Atomic Paraphrase TYpes), a dataset of 500 sentence-level and word-level annotations by 15 annotators. The dataset also provides a human preference ranking of paraphrases with different types that can be used to fine-tune models with RLHF and DPO methods. Our results reveal that ChatGPT can generate simple APTs, such as additions and deletions, but struggle with complex structures (e.g., subordination changes). This study contributes to understanding which aspects of paraphrasing LLMs have already succeeded at understanding and what remains elusive. In addition, our curated datasets can be used to develop LLMs with specific linguistic capabilities.
- Paraphrase type identification for plagiarism detection using contexts and word embeddings. International Journal of Educational Technology in Higher Education, 18(1).
- Ion Androutsopoulos and Prodromos Malakasiotis. 2009. A survey of paraphrasing and textual entailment methods. CoRR, abs/0912.3747.
- Training a helpful and harmless assistant with reinforcement learning from human feedback. ArXiv preprint, abs/2204.05862.
- Plagiarism meets paraphrasing: Insights for the next generation in automatic plagiarism detection. Computational Linguistics, 39(4):917–947.
- Rahul Bhagat and Eduard Hovy. 2013. Squibs: What is a paraphrase? Computational Linguistics, 39(3):463–472.
- FLEX: unifying evaluation for few-shot NLP. In Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, virtual, pages 15787–15800.
- Open problems and fundamental limitations of reinforcement learning from human feedback. ArXiv preprint, abs/2307.15217.
- Automatic text summarization: A comprehensive survey. Expert systems with applications, 165:113679.
- Terry N Flynn and Anthony AJ Marley. 2014. Best-worst scaling: theory and methods. In Handbook of choice modelling, pages 178–201. Edward Elgar Publishing.
- Is chatgpt better than human annotators? potential and limitations of chatgpt in explaining implicit hate speech. In Companion Proceedings of the ACM Web Conference 2023, WWW ’23 Companion, page 294–297, New York, NY, USA. Association for Computing Machinery.
- ETPC - a paraphrase identification corpus annotated with extended paraphrase typology and negation. In Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018), Miyazaki, Japan. European Language Resources Association (ELRA).
- Chin-Yew Lin. 2004. ROUGE: A package for automatic evaluation of summaries. In Text Summarization Branches Out, pages 74–81, Barcelona, Spain. Association for Computational Linguistics.
- DEKANG LIN and PATRICK PANTEL. 2001. Discovery of inference rules for question-answering. Natural Language Engineering, 7(4):343–360.
- Kathleen R. McKeown. 1983. Paraphrasing questions using given and new information. American Journal of Computational Linguistics, 9(1):1–10.
- OpenAI. 2023. GPT-3.5-turbo-06. https://openai.com/blog/gpt-3-5-turbo.
- Training language models to follow instructions with human feedback.
- Hemant Palivela. 2021. Optimization of paraphrase generation and identification using language models in natural language processing. International Journal of Information Management Data Insights, 1(2):100025.
- Direct preference optimization: Your language model is secretly a reward model.
- Is This a Paraphrase? What Kind? Paraphrase Boundaries and Typology. Open Journal of Modern Linguistics, 04(01):205–218.
- Paraphrase types for generation and detection. In Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, pages 12148–12164, Singapore. Association for Computational Linguistics.
- How large language models are transforming machine-paraphrase plagiarism. In Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, pages 952–963, Abu Dhabi, United Arab Emirates. Association for Computational Linguistics.
- Paraphrase types elicit prompt engineering capabilities.
- Chain-of-Thought Prompting Elicits Reasoning in Large Language Models.
- Sam Witteveen and Martin Andrews. 2019. Paraphrasing with large language models. In Proceedings of the 3rd Workshop on Neural Generation and Translation, pages 215–220, Hong Kong. Association for Computational Linguistics.
- Paraphrase identification with deep learning: A review of datasets and methods.
- Large language models are human-level prompt engineers. ArXiv preprint, abs/2211.01910.
- Dominik Meier (4 papers)
- Jan Philip Wahle (31 papers)
- Terry Ruas (46 papers)
- Bela Gipp (98 papers)