Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
126 tokens/sec
GPT-4o
47 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Data and Approaches for German Text simplification -- towards an Accessibility-enhanced Communication (2312.09966v1)

Published 15 Dec 2023 in cs.CL and cs.AI

Abstract: This paper examines the current state-of-the-art of German text simplification, focusing on parallel and monolingual German corpora. It reviews neural LLMs for simplifying German texts and assesses their suitability for legal texts and accessibility requirements. Our findings highlight the need for additional training data and more appropriate approaches that consider the specific linguistic characteristics of German, as well as the importance of the needs and preferences of target groups with cognitive or language impairments. The authors launched the interdisciplinary OPEN-LS project in April 2023 to address these research gaps. The project aims to develop a framework for text formats tailored to individuals with low literacy levels, integrate legal texts, and enhance comprehensibility for those with linguistic or cognitive impairments. It will also explore cost-effective ways to enhance the data with audience-specific illustrations using image-generating AI. For more and up-to-date information, please visit our project homepage https://open-ls.entavis.com

Definition Search Book Streamline Icon: https://streamlinehq.com
References (35)
  1. Language Models for German Text Simplification: Overcoming Parallel Data Scarcity through Style-specific Pre-training. In Findings of the Association for Computational Linguistics: ACL 2023, pages 1147–1158, Toronto, Canada. Association for Computational Linguistics.
  2. Dennis Aumiller. 2023. Klexikon: A German Dataset for Joint Summarization and Simplification. Original-date: 2022-01-05T09:09:42Z.
  3. Dennis Aumiller and Michael Gertz. 2022. Klexikon: A German Dataset for Joint Summarization and Simplification. In Proceedings of the 13th Conference on Language Resources and Evaluation (LREC 2022), pages 2693–2701.
  4. A Corpus for Automatic Readability Assessment and Text Simplification of German. In Proceedings of the 12th Language Resources and Evaluation Conference, pages 3302–3311, Marseille, France. European Language Resources Association.
  5. Antje Baumann. 2020. Rechtstexte als Barrieren – Einige Merkmale der Textsorte "Gesetz" und die Verständlichkeit. In Christiane Maaß and Isabel Rink, editors, Handbuch Barrierefreie Kommunikation, 1 edition, volume 3 of Kommunikation – Partizipation – Inklusion, pages 679–702. Frank & Timme, Berlin.
  6. Longformer: The Long-Document Transformer. ArXiv: 2004.05150.
  7. BGG. 2022. § 11 Disability Equality Act BGG.
  8. Ursula Bredel and Christiane Maaß. 2016. Leichte Sprache theoretische Grundlagen, Orientierung für die Praxis. Sprache im Blick. Dudenverlag.
  9. Unsupervised Simplification of Legal Texts. ArXiv:2209.00557 [cs].
  10. Simpatico: A Text Simplification System for Senate and House Bills. In Proceedings of the 11th National Natural Language Processing Research Symposium,, volume 11, pages 26–32, Manila.
  11. DIN-Normenausschuss Ergonomie. 2023. Empfehlungen für Deutsche Leichte Sprache (DIN SPEC 33429).
  12. Isabel Gallegos and Kaylee George. 2022. The Right to Remain Plain: Summarization and Simplification of Legal Documents.
  13. Der Einsatz von Neural Language Models für eine barrierefreie Verwaltungskommunikation: Anforderungen an die automatisierte Vereinfachung rechtlicher Informationstexte. In Proceedings of 6. Fachtagung Rechts- und Verwaltungsinformatik (RVI 2023), pages 144–158, Bonn. Gesellschaft für Informatik e.V.
  14. Anke Grotlüschen and Klaus Buddeberg, editors. 2020. LEO 2018: Leben mit geringer Literalität. wbv, Bielefeld.
  15. Readability Classification for German using Lexical, Syntactic, and Morphological Features. In Proceedings of COLING 2012, pages 1063–1080, Mumbai, India. The COLING 2012 Organizing Committee.
  16. Silvia Hansen-Schirra and Christiane Maaß, editors. 2020. Easy Language – Plain Language – Easy Language Plus: Balancing Comprehensibility and Acceptability, 1 edition, volume 3 of Easy–Plain–Accessible. Frank & Timme, Berlin. Accepted: 2020-09-28T09:51:54Z.
  17. Freya Hewett. 2022. lexica-corpus. Original-date: 2021-08-13T09:12:24Z.
  18. Freya Hewett and Manfred Stede. 2021. Automatically evaluating the conceptual complexity of German texts. In Proceedings of the 17th Conference on Natural Language Processing (KONVENS 2021), pages 228–234, Düsseldorf, Germany. KONVENS 2021 Organizers.
  19. Building a German/Simple German Parallel Corpus for Automatic Text Simplification. In Klaper, David; Ebling, S; Volk, Martin (2013). Building a German/Simple German Parallel Corpus for Automatic Text Simplification. In: The Second Workshop on Predicting and Improving Text Readability for Target Reader Populations (PITR 2013), Sofia, Bulgaria, 8 August 2013., pages 11–19, Sofia, Bulgaria. University of Zurich.
  20. Towards machine translation into Easy Language in public administrations: Algorithmic alignment suggestions for building a translation memory. In Silvana Deilen, Silvia Hansen-Schirra, Sergio Hernández Garrido, Christiane Maaß, and Anke Tardel, editors, Emerging Fields in Easy Language and Accessible Communication Research, volume 14, pages 371–406. Frank & Timme GmbH, Berlin. Series Title: Easy – Plain – Accessible.
  21. Multilingual Denoising Pre-training for Neural Machine Translation. Transactions of the Association for Computational Linguistics, 8:726–742. Place: Cambridge, MA Publisher: MIT Press.
  22. Laura Manor and Junyi Jessy Li. 2019. Plain English Summarization of Contracts. In Proceedings of the Natural Legal Language Processing Workshop 2019, pages 1–11, Minneapolis, Minnesota. Association for Computational Linguistics.
  23. Babak Naderi. 2023. Text Complexity DE. Original-date: 2020-09-30T09:43:40Z.
  24. A New Dataset and Efficient Baselines for Document-level Text Simplification in German. In Proceedings of the Third Workshop on New Frontiers in Summarization, pages 152–161, Online and in Dominican Republic. Association for Computational Linguistics. Tex.ids= riosNewDatasetEfficient2021a.
  25. Exploring Automatic Text Simplification of German Narrative Documents. In Proceedings of the 19th Conference on Natural Language Processing (KONVENS 2023).
  26. Subjective Text Complexity Assessment for German. In Proceedings of the Thirteenth Language Resources and Evaluation Conference, pages 707–714, Marseille, France. European Language Resources Association.
  27. Lucia Specia. 2010. Translating from Complex to Simplified Sentences. In Computational Processing of the Portuguese Language, Lecture Notes in Computer Science, pages 30–39, Berlin, Heidelberg. Springer.
  28. Exploring German Multi-Level Text Simplification. In Proceedings of the International Conference on Recent Advances in Natural Language Processing (RANLP 2021), pages 1339–1349, Held Online. INCOMA Ltd.
  29. DEPLAIN: A German Parallel Corpus with Intralingual Translations into Plain Language for Sentence and Document Simplification. ArXiv:2305.18939 [cs].
  30. Rule-based Automatic Text Simplification for German. In Proceedings of the 13th Conference on Natural Language Processing, pages 279–287.
  31. Benchmarking Data-driven Automatic Text Simplification for German. In Proceedings of the 1st Workshop on Tools and Resources to Empower People with REAding DIfficulties (READI), pages 41–48, Marseille, France. European Language Resources Association.
  32. Vanessa Toborek and Moritz Busch. 2023. A New Aligned Simple German Corpus. Original-date: 2022-08-22T10:58:53Z.
  33. A New Aligned Simple German Corpus.
  34. UN. 2008. UN Convention on the Rights of Persons with Disabilities (CRPD).
  35. Zarah Weiß and Detmar Meurers. 2018. Modeling the Readability of German Targeting Adults and Children: An empirically broad analysis and its cross-corpus validation. In Proceedings of the 27th International Conference on Computational Linguistics, pages 303–317, Santa Fe, New Mexico, USA. Association for Computational Linguistics.
Citations (2)

Summary

We haven't generated a summary for this paper yet.