Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
97 tokens/sec
GPT-4o
53 tokens/sec
Gemini 2.5 Pro Pro
44 tokens/sec
o3 Pro
5 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Exploring Automatic Text Simplification of German Narrative Documents (2312.09907v1)

Published 15 Dec 2023 in cs.CL and cs.AI

Abstract: In this paper, we apply transformer-based Natural Language Generation (NLG) techniques to the problem of text simplification. Currently, there are only a few German datasets available for text simplification, even fewer with larger and aligned documents, and not a single one with narrative texts. In this paper, we explore to which degree modern NLG techniques can be applied to German narrative text simplifications. We use Longformer attention and a pre-trained mBART model. Our findings indicate that the existing approaches for German are not able to solve the task properly. We conclude on a few directions for future research to address this problem.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (36)
  1. The (Un)Suitability of Automatic Evaluation Metrics for Text Simplification. Computational Linguistics, 47(4):861–889.
  2. Language Models for German Text Simplification: Overcoming Parallel Data Scarcity through Style-specific Pre-training. ArXiv:2305.12908 [cs].
  3. Dennis Aumiller and Michael Gertz. 2022. Klexikon: A German Dataset for Joint Summarization and Simplification. In Proceedings of the 13th Conference on Language Resources and Evaluation (LREC 2022), pages 2693–2701.
  4. A Corpus for Automatic Readability Assessment and Text Simplification of German. In Proceedings of the 12th Language Resources and Evaluation Conference, pages 3302–3311, Marseille, France. European Language Resources Association.
  5. Longformer: The Long-Document Transformer. ArXiv: 2004.05150.
  6. Recall and Learn: Fine-tuning Deep Pretrained Language Models with Less Forgetting. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 7870–7881, Online. Association for Computational Linguistics.
  7. Unsupervised Cross-lingual Representation Learning at Scale. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pages 8440–8451, Online. Association for Computational Linguistics.
  8. Automatic Text Simplification for German. Frontiers in Communication, 7:706718. Publisher: Frontiers Research Foundation.
  9. Neural Natural Language Generation: A Survey on Multilinguality, Multimodality, Controllability and Learning. Journal of Artificial Intelligence Research, 73:1131–1207.
  10. Hierarchical Neural Story Generation. In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 889–898, Melbourne, Australia. Association for Computational Linguistics.
  11. Dino Felluga. 2011. General Introduction to Narratology.
  12. Readability Classification for German using Lexical, Syntactic, and Morphological Features. In Proceedings of COLING 2012, pages 1063–1080, Mumbai, India. The COLING 2012 Organizing Committee.
  13. The Curious Case of Neural Text Degeneration. In International Conference on Learning Representations, pages 1–16.
  14. Building a German/Simple German Parallel Corpus for Automatic Text Simplification. In Klaper, David; Ebling, S; Volk, Martin (2013). Building a German/Simple German Parallel Corpus for Automatic Text Simplification. In: The Second Workshop on Predicting and Improving Text Readability for Target Reader Populations (PITR 2013), Sofia, Bulgaria, 8 August 2013., pages 11–19, Sofia, Bulgaria. University of Zurich.
  15. I Kontoyiannisy. 1997. The Complexity and Entropy of Literary Styles. Technical report.
  16. BART: Denoising Sequence-to-Sequence Pre-training for Natural Language Generation, Translation, and Comprehension.
  17. Chin-Yew Lin. 2004. ROUGE: A Package for Automatic Evaluation of Summaries. In Text Summarization Branches Out, pages 74–81, Barcelona, Spain. Association for Computational Linguistics.
  18. Straight to the Gradient: Learning to Use Novel Tokens for Neural Text Generation. In Proceedings of the 38th International Conference on Machine Learning, pages 6642–6653. PMLR. ISSN: 2640-3498.
  19. Multilingual Denoising Pre-training for Neural Machine Translation. Transactions of the Association for Computational Linguistics, 8:726–742. Place: Cambridge, MA Publisher: MIT Press.
  20. Bleu: a Method for Automatic Evaluation of Machine Translation. In Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics, pages 311–318, Philadelphia, Pennsylvania, USA. Association for Computational Linguistics.
  21. Anatomy of Catastrophic Forgetting: Hidden Representations and Task Semantics. In Proceedings of Conference on Learning Representations, pages 1–31. International Conference on Learning Representations.
  22. A New Dataset and Efficient Baselines for Document-level Text Simplification in German. In Proceedings of the Third Workshop on New Frontiers in Summarization, pages 152–161, Online and in Dominican Republic. Association for Computational Linguistics. Tex.ids= riosNewDatasetEfficient2021a.
  23. Data and Approaches for German Text Simplification - Next Steps toward an Accessibility-enhanced Communication.
  24. Get To The Point: Summarization with Pointer-Generator Networks. In Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 1073–1083, Vancouver, Canada. Association for Computational Linguistics.
  25. Lucia Specia. 2010. Translating from Complex to Simplified Sentences. In Computational Processing of the Portuguese Language, Lecture Notes in Computer Science, pages 30–39, Berlin, Heidelberg. Springer.
  26. DEPLAIN: A German Parallel Corpus with Intralingual Translations into Plain Language for Sentence and Document Simplification. ArXiv:2305.18939 [cs].
  27. Yixuan Su and Nigel Collier. 2022. Contrastive Search Is What You Need For Neural Text Generation. ArXiv:2210.14140 [cs].
  28. Rule-based Automatic Text Simplification for German. In Proceedings of the 13th Conference on Natural Language Processing, pages 279–287.
  29. Benchmarking Data-driven Automatic Text Simplification for German. In Proceedings of the 1st Workshop on Tools and Resources to Empower People with REAding DIfficulties (READI), pages 41–48, Marseille, France. European Language Resources Association.
  30. Zarah Weiß and Detmar Meurers. 2018. Modeling the Readability of German Targeting Adults and Children: An empirically broad analysis and its cross-corpus validation. In Proceedings of the 27th International Conference on Computational Linguistics, pages 303–317, Santa Fe, New Mexico, USA. Association for Computational Linguistics.
  31. Neural Text Generation With Unlikelihood Training. In International Conference on Learning Representations, pages 1–18. International Conference on Learning Representations.
  32. CCNet: Extracting High Quality Monolingual Datasets from Web Crawl Data. In Proceedings of the 12th Conference on Language Resources and Evaluation (LREC 2020), page 10.
  33. Learning to Break the Loop: Analyzing and Mitigating Repetitions for Neural Text Generation. In Proceedings of the 36th Conference on Neural Information Processing Systems, pages 1–36. 36th Conference on Neural Information Processing Systems. ArXiv:2206.02369 [cs].
  34. AdaptSum: Towards Low-Resource Domain Adaptation for Abstractive Summarization. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pages 5892–5904, Online. Association for Computational Linguistics.
  35. Tianyi Zhang. 2020. BERTScore Default Layer Performance on WMT16 (last accessed: 2022-09-26).
  36. BERTScore: Evaluating Text Generation with BERT. ArXiv:1904.09675 [cs].
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (3)
Citations (2)

Summary

We haven't generated a summary for this paper yet.