Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
119 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

ALBERTI, a Multilingual Domain Specific Language Model for Poetry Analysis (2307.01387v1)

Published 3 Jul 2023 in cs.CL

Abstract: The computational analysis of poetry is limited by the scarcity of tools to automatically analyze and scan poems. In a multilingual settings, the problem is exacerbated as scansion and rhyme systems only exist for individual languages, making comparative studies very challenging and time consuming. In this work, we present \textsc{Alberti}, the first multilingual pre-trained LLM for poetry. Through domain-specific pre-training (DSP), we further trained multilingual BERT on a corpus of over 12 million verses from 12 languages. We evaluated its performance on two structural poetry tasks: Spanish stanza type classification, and metrical pattern prediction for Spanish, English and German. In both cases, \textsc{Alberti} outperforms multilingual BERT and other transformers-based models of similar sizes, and even achieves state-of-the-art results for German when compared to rule-based systems, demonstrating the feasibility and effectiveness of DSP in the poetry domain.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (25)
  1. Semantics of european poetry is shaped by conservative forces: The relationship between poetic meter and meaning in accentual-syllabic verse, Plos one 17 (2022) e0266556.
  2. Domain-specific language model pretraining for biomedical natural language processing 3 (2021). URL: https://doi.org/10.1145/3458754. doi:10.1145/3458754.
  3. BERT: Pre-training of deep bidirectional transformers for language understanding, in: Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), Association for Computational Linguistics, Minneapolis, Minnesota, 2019, pp. 4171–4186. URL: https://aclanthology.org/N19-1423. doi:10.18653/v1/N19-1423.
  4. Attention is all you need, Advances in neural information processing systems 30 (2017).
  5. E. Manjavacas Arevalo, L. Fonteyn, MacBERTh: Development and evaluation of a historically pre-trained language model for English (1450-1950), in: Proceedings of the Workshop on Natural Language Processing for Digital Humanities, NLP Association of India (NLPAI), NIT Silchar, India, 2021, pp. 23–36. URL: https://aclanthology.org/2021.nlp4dh-1.4.
  6. S. Schweter, L. März, Triple e - effective ensembling of embeddings and language models for ner of historical german, in: Working Notes of CLEF 2020 - Conference and Labs of the Evaluation Forum, CEUR-WS.org, Thessaloniki, Greece, 2020. URL: http://ceur-ws.org/Vol-2696/paper_173.pdf.
  7. J. H. Lau, et al., Deep-speare: A joint neural model of poetic language, meter and rhyme, in: Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), Association for Computational Linguistics, 2018, pp. 1948–1958. URL: https://doi.org/10.18653/v1/P18-1181. doi:10.18653/v1/P18-1181.
  8. PoeLM: A meter- and rhyme-controllable language model for unsupervised poetry generation, in: Findings of the Association for Computational Linguistics: EMNLP 2022, Association for Computational Linguistics, Abu Dhabi, United Arab Emirates, 2022, pp. 3655–3670. URL: https://aclanthology.org/2022.findings-emnlp.268.
  9. P. Gervás, A logic programming application for the analysis of spanish verse, in: Computational Logic—CL 2000: First International Conference London, UK, July 24–28, 2000 Proceedings, Springer, 2000, pp. 1330–1344.
  10. R. Ibrahim, P. Plecháč, Toward automatic analysis of czech verse, Formal methods in poetics (2011) 295–305.
  11. A. Anttila, R. Heuser, Phonological and metrical variation across genres, in: Proceedings of the Annual Meetings on Phonology, volume 3, 2016.
  12. M. Agirrezabal, et al., A comparison of feature-based and neural scansion of poetry, in: Proceedings of the International Conference Recent Advances in Natural Language Processing, RANLP 2017, INCOMA Ltd., 2017, pp. 18–23. URL: https://doi.org/10.26615/978-954-452-049-6_003. doi:10.26615/978-954-452-049-6_003.
  13. Rantanplan, fast and accurate syllabification and scansion of spanish poetry, Procesamiento del Lenguaje Natural 65 (2020) 83–90. URL: https://web.27nov.2021.
  14. Transformers analyzing poetry: multilingual metrical pattern prediction with transformer-based language models, Neural Computing & Applications (2021). URL: https://doi.org/10.1007/s00521-021-06692-2. doi:10.1007/s00521-021-06692-2.
  15. P. Jauralde Pou, Métrica española, Madrid: Cátedra (2020).
  16. A. Pérez Pozo, et al., A bridge too far for artificial intelligence?: Automatic classification of stanzas in spanish poetry, Journal of the Association for Information Science and Technology (2021). URL: https://doi.org/10.1002/asi.24532. doi:10.1002/asi.24532, accessed 10 Dec. 2021.
  17. Metrical annotation of a large corpus of spanish sonnets: representation, scansion and evaluation, in: International Conference on Language Resources and Evaluation, 2016, pp. 4360–4364.
  18. B. Navarro-Colorado, A metrical scansion system for fixed-metre spanish poetry, Digital Scholarship in the Humanities 33 (2017) 112–127.
  19. H. F. Tucker, Poetic data and the news from poems: A" for better for verse" memoir, Victorian Poetry 49 (2011) 267–281.
  20. T. Haider, J. Kuhn, Supervised rhyme detection with siamese recurrent networks, in: Proceedings of the Second Joint SIGHUM Workshop on Computational Linguistics for Cultural Heritage, Social Sciences, Humanities and Literature, 2018, pp. 81–86.
  21. Po-emo: Conceptualization, annotation, and modeling of aesthetic emotions in german and english poetry, arXiv preprint arXiv:2003.07723 (2020).
  22. Spanish pre-trained bert model and evaluation data, in: PML4DC at ICLR 2020, 2020.
  23. Unsupervised cross-lingual representation learning at scale, CoRR abs/1911.02116 (2019). URL: http://arxiv.org/abs/1911.02116. arXiv:1911.02116.
  24. K. Bobenhausen, The metricalizer2–automated metrical markup of german poetry, Current Trends in Metrical Analysis, Bern: Peter Lang (2011) 119–131.
  25. The stanford literary lab transhistorical poetry project phase ii: Metrical form., in: DH, 2014.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (4)
  1. Javier de la Rosa (12 papers)
  2. Álvaro Pérez Pozo (1 paper)
  3. Salvador Ros (4 papers)
  4. Elena González-Blanco (3 papers)
Citations (4)

Summary

We haven't generated a summary for this paper yet.