ALBERTI, a Multilingual Domain Specific Language Model for Poetry Analysis (2307.01387v1)

Published 3 Jul 2023 in cs.CL

Abstract: The computational analysis of poetry is limited by the scarcity of tools to automatically analyze and scan poems. In a multilingual settings, the problem is exacerbated as scansion and rhyme systems only exist for individual languages, making comparative studies very challenging and time consuming. In this work, we present \textsc{Alberti}, the first multilingual pre-trained LLM for poetry. Through domain-specific pre-training (DSP), we further trained multilingual BERT on a corpus of over 12 million verses from 12 languages. We evaluated its performance on two structural poetry tasks: Spanish stanza type classification, and metrical pattern prediction for Spanish, English and German. In both cases, \textsc{Alberti} outperforms multilingual BERT and other transformers-based models of similar sizes, and even achieves state-of-the-art results for German when compared to rule-based systems, demonstrating the feasibility and effectiveness of DSP in the poetry domain.

References (25)

Authors (4)

Javier de la Rosa (12 papers)
Álvaro Pérez Pozo (1 paper)
Salvador Ros (4 papers)
Elena González-Blanco (3 papers)

Citations (4)

View on Semantic Scholar

Summary

We haven't generated a summary for this paper yet.

Summarize Now

ALBERTI, a Multilingual Domain Specific Language Model for Poetry Analysis (2307.01387v1)

Summary

Related Papers