Is it Possible to Modify Text to a Target Readability Level? An Initial Investigation Using Zero-Shot Large Language Models (2309.12551v2)
Abstract: Text simplification is a common task where the text is adapted to make it easier to understand. Similarly, text elaboration can make a passage more sophisticated, offering a method to control the complexity of reading comprehension tests. However, text simplification and elaboration tasks are limited to only relatively alter the readability of texts. It is useful to directly modify the readability of any text to an absolute target readability level to cater to a diverse audience. Ideally, the readability of readability-controlled generated text should be independent of the source text. Therefore, we propose a novel readability-controlled text modification task. The task requires the generation of 8 versions at various target readability levels for each input text. We introduce novel readability-controlled text modification metrics. The baselines for this task use ChatGPT and Llama-2, with an extension approach introducing a two-step process (generating paraphrases by passing through the LLM twice). The zero-shot approaches are able to push the readability of the paraphrases in the desired direction but the final readability remains correlated with the original text's readability. We also find greater drops in semantic and lexical similarity between the source and target texts with greater shifts in the readability.
- A survey on machine reading comprehension systems. Natural Language Engineering, 28(6):683–732.
- Rahul Bhagat and Eduard Hovy. 2013. What is a paraphrase? Computational Linguistics, 39(3):463–472.
- Lexi: A tool for adaptive, personalized text simplification. In Proceedings of the 27th International Conference on Computational Linguistics, pages 245–258.
- Language models are few-shot learners. Advances in neural information processing systems, 33:1877–1901.
- Fine-grained controllable text generation using non-residual prompting. In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 6837–6857.
- Practical simplification of english newspaper text to assist aphasic readers. In Proceedings of the AAAI-98 Workshop on Integrating Artificial Intelligence and Assistive Technology, pages 7–10. Association for the Advancement of Artificial Intelligence.
- David Chen and William B Dolan. 2011. Collecting highly parallel data for paraphrase evaluation. In Proceedings of the 49th annual meeting of the association for computational linguistics: human language technologies, pages 190–200.
- Sentiment-controllable chinese poetry generation. In IJCAI, pages 4925–4931.
- A semantically consistent and syntactically variational encoder-decoder framework for paraphrase generation. In Proceedings of the 28th International Conference on Computational Linguistics, pages 1186–1198.
- Palm: Scaling language modeling with pathways. arXiv preprint arXiv:2204.02311.
- Meri Coleman and Ta Lin Liau. 1975. A computer readability formula designed for machine scoring. Journal of Applied Psychology, 60(2):283.
- A large-scaled corpus for assessing text readability. Behavior Research Methods, 55(2):491–507.
- The commonlit ease of readability (clear) corpus. In EDM.
- Edgar Dale and Jeanne S Chall. 1949. The concept of readability. Elementary English, 26(1):19–26.
- Plug and play language models: A simple approach to controlled text generation. arXiv preprint arXiv:1912.02164.
- Jan De Belder and Marie-Francine Moens. 2010. Text simplification for children. In Prroceedings of the SIGIR workshop on accessible search systems, pages 19–26. ACM; New York.
- William H DuBay. 2007. Smart Language: Readers, Readability, and the Grading of Text. ERIC.
- An evaluation of syntactic simplification rules for people with autism. Association for Computational Linguistics.
- Controllable abstractive summarization. In Proceedings of the 2nd Workshop on Neural Machine Translation and Generation, pages 45–54.
- Rudolph Flesch. 1948. A new readability yardstick. Journal of applied psychology, 32(3):221.
- Lila R Gleitman and Henry Gleitman. 1970. Phrase and paraphrase: Some innovative uses of language.
- Robert Gunning et al. 1952. Technique of clear writing.
- Theodore L Harris and Richard E Hodges. 1995. The literacy dictionary: The vocabulary of reading and writing. ERIC.
- Xingwei He. 2021. Parallel refinements for lexically constrained text generation with bart. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, pages 8653–8666.
- Jacob Devlin Ming-Wei Chang Kenton and Lee Kristina Toutanova. 2019. Bert: Pre-training of deep bidirectional transformers for language understanding. In Proceedings of NAACL-HLT, pages 4171–4186.
- A distributional approach to controlled text generation. arXiv preprint arXiv:2012.11635.
- Controlling output length in neural encoder-decoders. In Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing, pages 1328–1338.
- Derivation of new readability formulas (automated readability index, fog count and flesch reading ease formula) for navy enlisted personnel. Technical report, Naval Technical Training Command Millington TN Research Branch.
- George R Klare. 1974. Assessing readability. Reading research quarterly, pages 62–102.
- Gedi: Generative discriminator guided sequence generation. In Findings of the Association for Computational Linguistics: EMNLP 2021, pages 4929–4952.
- Syntax-guided controlled generation of paraphrases. Transactions of the Association for Computational Linguistics, 8:330–345.
- A survey on evaluation metrics for machine translation. Mathematics, 11(4):1006.
- Towards document-level paraphrase generation with sentence rewriting and reordering. In Findings of the Association for Computational Linguistics: EMNLP 2021, pages 1033–1044.
- Dexperts: Decoding-time controlled text generation with experts and anti-experts. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), pages 6691–6706.
- Content preserving text generation with attribute controls. Advances in Neural Information Processing Systems, 31.
- Chao-Yi Lu and Sin-En Lu. 2021. A survey of approaches to automatic question generation: from 2019 to early 2021. In Proceedings of the 33rd Conference on Computational Linguistics and Speech Processing (ROCLING 2021), pages 151–162.
- Automatic speech recognition: a survey. Multimedia Tools and Applications, 80:9411–9457.
- Franz Josef Och. 2003. Minimum error rate training in statistical machine translation. In Proceedings of the 41st annual meeting of the Association for Computational Linguistics, pages 160–167.
- Data-to-text generation with content selection and planning. In Proceedings of the AAAI conference on artificial intelligence, volume 33, pages 6908–6915.
- Squad: 100,000+ questions for machine comprehension of text. In Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing, pages 2383–2392.
- Frequent words improve readability and short words improve understandability for people with dyslexia. In Human-Computer Interaction–INTERACT 2013: 14th IFIP TC 13 International Conference, Cape Town, South Africa, September 2-6, 2013, Proceedings, Part IV 14, pages 203–219. Springer.
- Investigating pretrained language models for graph-to-text generation. In Proceedings of the 3rd Workshop on Natural Language Processing for Conversational AI, pages 211–227.
- Simplification or elaboration? the effects of two types of text modifications on foreign language reading comprehension.
- Multitask prompted training enables zero-shot task generalization. arXiv preprint arXiv:2110.08207.
- Bloom: A 176b-parameter open-access multilingual language model. arXiv preprint arXiv:2211.05100.
- Controlling politeness in neural machine translation via side constraints. In Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pages 35–40.
- Automated readability index. Technical report, Cincinnati Univ OH.
- Advaith Siddharthan. 2014. A survey of research on text simplification. ITL-International Journal of Applied Linguistics, 165(2):259–298.
- Punardeep Sikka and Vijay Mago. 2020. A survey on text simplification. arXiv preprint arXiv:2008.08612.
- George Spache. 1953. A new readability formula for primary-grade reading materials. The Elementary School Journal, 53(7):410–413.
- Document-level text simplification: Dataset, criteria and baseline. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, pages 7997–8013.
- Llama 2: Open foundation and fine-tuned chat models. arXiv preprint arXiv:2307.09288.
- Review of automatic text summarization techniques & methods. Journal of King Saud University-Computer and Information Sciences, 34(4):1029–1046.
- Controlling the voice of a sentence in japanese-to-english neural machine translation. In Proceedings of the 3rd Workshop on Asian Translation (WAT2016), pages 203–210.
- A survey of controllable text generation using transformer-based pre-trained language models. ACM Computing Surveys.
- Bertscore: Evaluating text generation with bert. arXiv preprint arXiv:1904.09675.
- A comprehensive survey on pretrained foundation models: A history from bert to chatgpt. arXiv preprint arXiv:2302.09419.
- Jianing Zhou and Suma Bhat. 2021. Paraphrase generation: A survey of the state of the art. In Proceedings of the 2021 conference on empirical methods in natural language processing, pages 5075–5086.
- Asma Farajidizaji (1 paper)
- Vatsal Raina (19 papers)
- Mark Gales (52 papers)