Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash 86 tok/s
Gemini 2.5 Pro 51 tok/s Pro
GPT-5 Medium 43 tok/s
GPT-5 High 37 tok/s Pro
GPT-4o 98 tok/s
GPT OSS 120B 466 tok/s Pro
Kimi K2 225 tok/s Pro
2000 character limit reached

Unveiling Factors for Enhanced POS Tagging: A Study of Low-Resource Medieval Romance Languages (2506.17715v1)

Published 21 Jun 2025 in cs.CL and cs.LG

Abstract: Part-of-speech (POS) tagging remains a foundational component in natural language processing pipelines, particularly critical for historical text analysis at the intersection of computational linguistics and digital humanities. Despite significant advancements in modern LLMs for ancient languages, their application to Medieval Romance languages presents distinctive challenges stemming from diachronic linguistic evolution, spelling variations, and labeled data scarcity. This study systematically investigates the central determinants of POS tagging performance across diverse corpora of Medieval Occitan, Medieval Spanish, and Medieval French texts, spanning biblical, hagiographical, medical, and dietary domains. Through rigorous experimentation, we evaluate how fine-tuning approaches, prompt engineering, model architectures, decoding strategies, and cross-lingual transfer learning techniques affect tagging accuracy. Our results reveal both notable limitations in LLMs' ability to process historical language variations and non-standardized spelling, as well as promising specialized techniques that effectively address the unique challenges presented by low-resource historical languages.

List To Do Tasks Checklist Streamline Icon: https://streamlinehq.com

Collections

Sign up for free to add this paper to one or more collections.

Summary

We haven't generated a summary for this paper yet.

Dice Question Streamline Icon: https://streamlinehq.com

Follow-up Questions

We haven't generated follow-up questions for this paper yet.