Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
162 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
45 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

RewriteLM: An Instruction-Tuned Large Language Model for Text Rewriting (2305.15685v2)

Published 25 May 2023 in cs.CL and cs.AI

Abstract: LLMs have demonstrated impressive capabilities in creative tasks such as storytelling and E-mail generation. However, as LLMs are primarily trained on final text results rather than intermediate revisions, it might be challenging for them to perform text rewriting tasks. Most studies in the rewriting tasks focus on a particular transformation type within the boundaries of single sentences. In this work, we develop new strategies for instruction tuning and reinforcement learning to better align LLMs for cross-sentence rewriting tasks using diverse wording and structures expressed through natural languages including 1) generating rewriting instruction data from Wiki edits and public corpus through instruction generation and chain-of-thought prompting; 2) collecting comparison data for reward model training through a new ranking function. To facilitate this research, we introduce OpenRewriteEval, a novel benchmark covers a wide variety of rewriting types expressed through natural language instructions. Our results show significant improvements over a variety of baselines. The public repository is available on GitHub under Google Research (https://github.com/google-research/google-research/tree/master/rewritelm).

Definition Search Book Streamline Icon: https://streamlinehq.com
References (57)
  1. ASSET: A dataset for tuning and evaluation of sentence simplification models with multiple rewriting transformations. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pp.  4668–4679, Online, July 2020. Association for Computational Linguistics. URL https://www.aclweb.org/anthology/2020.acl-main.424.
  2. Latent dirichlet allocation. Journal of machine Learning research, 3(Jan):993–1022, 2003.
  3. A large annotated corpus for learning natural language inference. In Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, pp.  632–642, 2015.
  4. Language models are few-shot learners. Advances in neural information processing systems, 33:1877–1901, 2020.
  5. Vicuna: An open-source chatbot impressing gpt-4 with 90%* chatgpt quality, March 2023. URL https://lmsys.org/blog/2023-03-30-vicuna/.
  6. Palm: Scaling language modeling with pathways. arXiv preprint arXiv:2204.02311, 2022.
  7. Scaling instruction-finetuned language models. arXiv preprint arXiv:2210.11416, 2022.
  8. A discourse-aware attention model for abstractive summarization of long documents. In Proceedings of NAACL-HLT, pp.  615–621, 2018.
  9. Editeval: An instruction-based benchmark for text improvements. arXiv, 2022. doi: 10.48550/ARXIV.2209.13331. URL https://arxiv.org/abs/2209.13331.
  10. Multi-news: A large-scale multi-document summarization dataset and abstractive hierarchical model. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, pp.  1074–1084, 2019.
  11. Text editing by command. arXiv preprint arXiv:2010.12826, 2020.
  12. Joseph L Fleiss. Measuring nominal scale agreement among many raters. Psychological bulletin, 76(5):378, 1971.
  13. Wiki-40b: Multilingual language model dataset. In Proceedings of the 12th Language Resources and Evaluation Conference, pp.  2440–2452, 2020.
  14. Inductive representation learning on large graphs. Advances in neural information processing systems, 30, 2017.
  15. Revisiting self-training for neural sequence generation. arXiv preprint arXiv:1909.13788, 2019.
  16. True: Re-evaluating factual consistency evaluation. In Proceedings of the Second DialDoc Workshop on Document-grounded Dialogue and Conversational Question Answering, pp.  161–175, 2022.
  17. Large language models can self-improve. arXiv preprint arXiv:2210.11610, 2022.
  18. Efficient attentions for long document summarization. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pp.  1419–1436, 2021.
  19. Fruit: Faithfully reflecting updated information in text. In Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pp.  3670–3686, 2022.
  20. Survey of hallucination in natural language generation. ACM Computing Surveys, 55(12):1–38, 2023.
  21. Billsum: A corpus for automatic summarization of us legislation. EMNLP-IJCNLP 2019, pp.  48, 2019.
  22. Automatic evaluation of summaries using n-gram co-occurrence statistics. In Proceedings of the 2003 human language technology conference of the North American chapter of the association for computational linguistics, pp.  150–157, 2003.
  23. Learning word vectors for sentiment analysis. In Proceedings of the 49th annual meeting of the association for computational linguistics: Human language technologies, pp.  142–150, 2011.
  24. Edit5: Semi-autoregressive text-editing with t5 warm-start. arXiv preprint arXiv:2205.12209, 2022.
  25. Philip May. Machine translated multilingual sts benchmark dataset. 2021. URL https://github.com/PhilipMay/stsb-multi-mt.
  26. Ground truth for grammatical error correction metrics. In Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 2: Short Papers), pp.  588–593, 2015.
  27. Jfleg: A fluency corpus and benchmark for grammatical error correction. In Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics: Volume 2, Short Papers, pp.  229–234, Valencia, Spain, April 2017. Association for Computational Linguistics. URL http://www.aclweb.org/anthology/E17-2037.
  28. Training language models to follow instructions with human feedback. Advances in Neural Information Processing Systems, 35:27730–27744, 2022.
  29. Bleu: a method for automatic evaluation of machine translation. In Proceedings of the 40th annual meeting of the Association for Computational Linguistics, pp.  311–318, 2002.
  30. Palm 2 technical report. Technical report, Google Research, 2023.
  31. Improving wikipedia verifiability with ai. 2022.
  32. Automatically neutralizing subjective bias in text. In Proceedings of the aaai conference on artificial intelligence, volume 34, pp.  480–489, 2020.
  33. Is chatgpt a general-purpose natural language processing task solver? arXiv preprint arXiv:2302.06476, 2023.
  34. Compressive transformers for long-range sequence modelling, 2019.
  35. Exploring the limits of transfer learning with a unified text-to-text transformer. The Journal of Machine Learning Research, 21(1):5485–5551, 2020.
  36. Dear sir or madam, may i introduce the gyafc dataset: Corpus, benchmarks and metrics for formality style transfer. In Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers), pp.  129–140, 2018.
  37. A recipe for arbitrary text style transfer with large language models. arXiv preprint arXiv:2109.03910, 2021.
  38. Textsettr: Few-shot text style extraction and tunable targeted restyling. arXiv preprint arXiv:2010.03802, 2020.
  39. Learning string-edit distance. IEEE Transactions on Pattern Analysis and Machine Intelligence, 20(5):522–532, 1998.
  40. Multitask prompted training enables zero-shot task generalization. In ICLR 2022-Tenth International Conference on Learning Representations, 2022.
  41. Peer: A collaborative language model. arXiv preprint arXiv:2208.11663, 2022.
  42. Bigpatent: A large-scale dataset for abstractive and coherent summarization. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, pp.  2204–2213, 2019.
  43. Adafactor: Adaptive learning rates with sublinear memory cost. In International Conference on Machine Learning, pp.  4596–4604. PMLR, 2018.
  44. Unsupervised paraphrasing via deep reinforcement learning. In Proceedings of the 26th ACM SIGKDD international conference on knowledge discovery & data mining, pp.  1800–1809, 2020.
  45. Stanford alpaca: An instruction-following llama model. https://github.com/tatsu-lab/stanford_alpaca, 2023.
  46. Style transfer for texts: Retrain, report errors, compare with rewrites. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), pp.  3936–3945, 2019.
  47. Llama: Open and efficient foundation language models. arXiv preprint arXiv:2302.13971, 2023.
  48. Self-instruct: Aligning language model with self generated instructions. arXiv preprint arXiv:2212.10560, 2022a.
  49. Benchmarking generalization via in-context instructions on 1,600+ language tasks. arXiv preprint arXiv:2204.07705, 2022b.
  50. Self-training with noisy student improves imagenet classification. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp.  10687–10698, 2020.
  51. Paraphrasing for style. In Proceedings of COLING 2012, pp.  2899–2914, 2012.
  52. Optimizing statistical machine translation for text simplification. Transactions of the Association for Computational Linguistics, 4:401–415, 2016.
  53. A survey of controllable text generation using transformer-based pre-trained language models. arXiv preprint arXiv:2201.05337, 2022a.
  54. This email could save your life: Introducing the task of email subject line generation. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, pp.  446–456, 2019.
  55. Opt: Open pre-trained transformer language models. arXiv preprint arXiv:2205.01068, 2022b.
  56. Character-level convolutional networks for text classification. Advances in neural information processing systems, 28, 2015.
  57. Parallel data augmentation for formality style transfer. arXiv preprint arXiv:2005.07522, 2020.
Citations (35)

Summary

  • The paper presents novel strategies for instruction tuning and reinforcement learning, enabling effective cross-sentence rewriting.
  • It employs innovative data generation methods using Wikipedia edits and synthetic prompts to create a diverse rewriting dataset.
  • The model outperforms existing LLMs on the OpenRewriteEval benchmark, setting new standards in tone, style, and content preservation.

Overview of "RewriteLM: An Instruction-Tuned LLM for Text Rewriting"

"RewriteLM" presents a novel approach to text rewriting by instruction tuning LLMs for cross-sentence tasks. These tasks involve diverse rewordings and structural changes, essential for both professional and personal communication.

Methodology and Contributions

The paper outlines two main contributions:

  1. New Strategies for Instruction Tuning and Reinforcement Learning: Recognizing that most rewriting studies remain confined to single-sentence edits, this paper proposes cross-sentence capabilities utilizing two innovative strategies:
    • Data Generation: By using Wikipedia edits and public corpora, the authors create a varied set of rewriting instructions. They leverage chain-of-thought (CoT) prompting and synthetic data generation to enhance dataset diversity.
    • Reward Model Training: Instead of traditionally relying on human labelers, a novel ranking function evaluates rewrites on key dimensions such as content preservation and linguistic variability. This function effectively automates data collection for reward modeling.
  2. OpenRewriteEval Benchmark: The introduction of this benchmark represents a significant advance, covering numerous rewriting types beyond narrow task limitations. It is explicitly designed to evaluate cross-sentence rewriting, incorporating elements such as tone and style transfer.

Results and Evaluation

The authors conduct extensive empirical studies using the OpenRewriteEval benchmark, demonstrating that RewriteLM outperforms existing baselines, including state-of-the-art pre-trained models. Key findings include:

  • Models like Rewrite-PaLM and Rewrite-PaLM 2, fine-tuned from foundational models, consistently improve upon these models’ performances across multiple rewriting tasks.
  • Reinforcement learning, applied on top of supervised tuning, enhances model effectiveness further, culminating in the Rewrite-RLr/w_\text{r/w}-PaLM 2, setting new performance standards for text rewriting.

The paper also provides a comprehensive comparison of various LLMs, including prominent models such as InstructGPT, highlighting RewriteLM’s superior ability to produce coherent and user-aligned text outputs.

Implications and Future Prospects

The implications of this research are twofold:

  1. Practical Impact: RewriteLM offers a scalable solution for text rewriting, which has significant applications in content creation, editing, and adaptation across industries.
  2. Theoretical Contributions: By advancing instruction-tuning methodologies and benchmarking novel cross-sentence rewriting frameworks, this research provides a foundation for extending LLM capabilities further.

Future work could explore transferring these methodologies to other language-dependent tasks, potentially enhancing versatility and user specificity in AI-driven text manipulation.

Conclusion

In summary, the paper poses noteworthy advancements in leveraging LLMs for more complex and user-centric rewriting tasks, culminating in a state-of-the-art model for cross-sentence rewrites. Its methodological innovations and OpenRewriteEval benchmark offer robust tools for continued advancements in AI text generation capabilities.

Github Logo Streamline Icon: https://streamlinehq.com