LiMe: a Latin Corpus of Late Medieval Criminal Sentences (2404.12829v1)
Abstract: The Latin language has received attention from the computational linguistics research community, which has built, over the years, several valuable resources, ranging from detailed annotated corpora to sophisticated tools for linguistic analysis. With the recent advent of LLMs, researchers have also started developing models capable of generating vector representations of Latin texts. The performances of such models remain behind the ones for modern languages, given the disparity in available data. In this paper, we present the LiMe dataset, a corpus of 325 documents extracted from a series of medieval manuscripts called Libri sententiarum potestatis Mediolani, and thoroughly annotated by experts, in order to be employed for masked LLM, as well as supervised natural language processing tasks.
- David Bamman and Patrick J. Burns. 2020. Latin BERT: A Contextual Language Model for Classical Philology. arXiv e-prints, page arXiv:2009.10053.
- David Bamman and Gregory Crane. 2011. The ancient greek and latin dependency treebanks. In Language Technology for Cultural Heritage, pages 79–98, Berlin, Heidelberg. Springer Berlin Heidelberg.
- Alessandra Bassani. 2021. Le assoluzioni nel Liber comunis potestatis Mediolani: riflessioni sull’ipotesi di una giustizia giusta. Notariorum Itinera, 7:177–204.
- Liber sententiarum potestatis mediolani (1385): Storia, diritto, diplomatica e quadri comparativi. Notariorum Itinera, 7.
- LiMe - Liber sententiarum potestatis Mediolani.
- Clément Besnier and William Mattingly. 2021. Named-entity dataset for medieval latin, middle high german and old norse. Journal of Open Humanities Data, 7(0):23.
- Raffaella Bianchi Riva. 2021. Iniuria e insultus tra diritto e politica. Le offese alle magistrature comunali nella legislazione statutaria e nella prassi giudiziaria in età viscontea. Notariorum Itinera, 7:239–264.
- Patrick J. Burns. 2023. Latincy: Synthetic trained pipelines for latin nlp.
- Profiling of intertextuality in latin literature using word embeddings. Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pages 4900–4907.
- Luca Campisi. 2019. Prassi giudiziaria a vercelli nel xiv secolo. Studi di storia medioevale e di diplomatica - Nuova Serie, (2):131–150.
- A new Latin treebank for Universal Dependencies: Charters between Ancient Latin and Romance languages. In Proceedings of the Twelfth Language Resources and Evaluation Conference, pages 933–942, Marseille, France. European Language Resources Association.
- Medlatinepi and medlatinlit: Two datasets for the computational authorship analysis of medieval latin texts. Journal on Computing and Cultural Heritage, 15(3):1–15.
- Nadia Covini. 2012. Assenza o abbondanza? la documentazione giudiziaria lombarda nei fondi notarili e nelle carte ducali (stato di milano, xiv-xv secolo). La documentazione degli organi giudiziari, pages 483–499.
- Trevor Dean. 2007. Crime and Justice in Late Medieval Italy. Cambridge University Press.
- Trevor Dean. 2008. Theft and gender in late medieval bologna. Gender & History, 20(2):399–415.
- Beatrice Del Bo. 2021. Tutte le donne (del registro) del podestà fra cliché e novità. Notariorum Itinera, 7:83–106.
- Joseph Denooz. 2007. Opera latina: le nouveau site internet du lasla. Journal of Latin Linguistics, 9(3):21–34.
- Few-shot legal text segmentation via rewiring conditional random fields: A preliminary study. In Advances in Conceptual Modeling, pages 141–150, Cham. Springer Nature Switzerland.
- The FAITH project: integrated tools and methodologies for digital humanities. Proceedings of the Statistics and Data Science Conference, pages 323–327.
- Udante: First steps towards the universal dependencies treebank of dante’s latin works. In Proceedings of the Seventh Italian Conference on Computational Linguistics, pages 99–105. Accademia University Press.
- Andrea Gamberini. 2014. A Companion to Late Medieval and Early Modern Milan: The Distinctive Features of an Italian State, volume 7. Brill.
- La documentazione degli organi giudiziari nell’Italia tardo-medievale e moderna: atti del convegno di studi, Siena, Archivio di Stato, 15-17 settembre 2008. Number v. 1 in Pubblicazioni degli Archivi di Stato. Saggi. Ministerio per i beni e le attività culturali, direzione generale per gli archivi.
- Dag TT Haug and Marius Jøhndal. 2008. Creating a parallel treebank of the old indo-european bible translations. In Proceedings of the second workshop on language technology for cultural heritage data (LaTeCH 2008), pages 27–34.
- spacy: Industrial-strength natural language processing in python.
- Roberto Isotton. 2021. La repressione dei reati di furto e rapina nel Liber sententiarum potestatis Mediolani del 1385: acquisizioni e questioni aperte. Notariorum Itinera, 7:205–238.
- Classifying latin inscriptions of the roman empire: A machine-learning approach. In Proceedings of the Conference on Computational Humanities Research 2021CEUR-WS, volume 2989, pages 123–135.
- Piroska Lendvai and Claudia Wick. 2022. Finetuning Latin BERT for word sense disambiguation on the thesaurus linguae latinae. In Proceedings of the Workshop on Cognitive Aspects of the Lexicon, pages 37–41, Taipei, Taiwan. Association for Computational Linguistics.
- Didier Lett. 2021. I Registri Della Giustizia Penale Nell’Italia Dei Secoli XII-XV. Ecole française de Rome.
- Campisi Luca. 2021. L’impatto sociale. I protagonisti delle pratiche giudiziarie a Vercelli fra XIV e XV secolo. Phd thesis, Università degli Studi di Milano.
- Barbara McGillivray and Adam Kilgarriff. 2013. Tools for historical corpus research, and a corpus of latin. New methods in historical corpus linguistics, 1(3):247–257.
- Giovanni Minnucci. 2021. Intorno al Liber sententiarum potestatis Mediolani e ad altre fonti giudiziarie. Alcune note conclusive. Notariorum Itinera, 7:373–380.
- Antonio Padoa-Schioppa. 1996. La giustizia milanese nella prima età viscontea (1277-1300). Giuffrè.
- Antonio Padoa-Schioppa. 2017. A History of Law in Europe: From the Early Middle Ages to the Twentieth Century. Cambridge University Press.
- Fabrizio Pagnoni. 2021. Selezione e circolazione dei giudici ai malefici nel dominio visconteo fra Tre e Quattrocento. Notariorum Itinera, 7:61–81.
- Marco Passarotti. 2019. The Project of the Index Thomisticus Treebank, pages 299–320. De Gruyter Saur, Berlin, Boston.
- Interlinking through lemmas. the lexical collection of the lila knowledge base of linguistic resources for latin. Studi e Saggi Linguistici, 58(1):177–212.
- Pier Francesco Pizzi. 2021. Liber sententiarum potestatis Mediolani (1385), Edizione critica. Società Ligure di Storia Patria.
- Stanza: A Python natural language processing toolkit for many human languages. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics: System Demonstrations.
- Marton Ribary. 2020. A relational database of roman law based on justinian’s digest. Journal of Open Humanities Data, 6(1):5.
- C. Santoro. 1968. Gli offici del comune di Milano e del dominio visconteo sforzesco (1216-1515). 1. collana: Monografie, ricerche ausilierie, opere strumentali. A. Guiffrè.
- Overview of the EvaLatin 2022 evaluation campaign. In Proceedings of the Second Workshop on Language Technologies for Historical and Ancient Languages, pages 183–188, Marseille, France. European Language Resources Association.
- Claudia Storti. 2021. 1385: un anno tra politica e giustizia a milano. Notariorum Itinera, 7:7–31.
- Milan Straka. 2018. UDPipe 2.0 prototype at CoNLL 2018 UD shared task. In Proceedings of the CoNLL 2018 Shared Task: Multilingual Parsing from Raw Text to Universal Dependencies, pages 197–207, Brussels, Belgium. Association for Computational Linguistics.
- M. Vallerani. 2012. Medieval Public Justice. Studies in Medieval & Early Mo. Catholic University of America Press.
- Chiara Valsecchi. 2021. «Per viam inquisicionis». Note sul processo criminale a Milano in un’età di transizione. Notariorum Itinera, 7:127–176.
- E. Verga. 1901. Le sentenze criminali dei podestà milanesi 1385-1429: appunti per la storia della giustizia punitiva in Milano. P. Confalonieri.