Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
169 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
45 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

ÚFAL LatinPipe at EvaLatin 2024: Morphosyntactic Analysis of Latin (2404.05839v2)

Published 8 Apr 2024 in cs.CL

Abstract: We present LatinPipe, the winning submission to the EvaLatin 2024 Dependency Parsing shared task. Our system consists of a fine-tuned concatenation of base and large pre-trained LMs, with a dot-product attention head for parsing and softmax classification heads for morphology to jointly learn both dependency parsing and morphological analysis. It is trained by sampling from seven publicly available Latin corpora, utilizing additional harmonization of annotations to achieve a more unified annotation style. Before fine-tuning, we train the system for a few initial epochs with frozen weights. We also add additional local relative contextualization by stacking the BiLSTM layers on top of the Transformer(s). Finally, we ensemble output probability distributions from seven randomly instantiated networks for the final submission. The code is available at https://github.com/ufal/evalatin2024-latinpipe.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (32)
  1. David Bamman and Patrick J. Burns. 2020. Latin BERT: A contextual language model for classical philology. CoRR, abs/2009.10053.
  2. David Bamman and Gregory Crane. 2011. The Ancient Greek and Latin Dependency Treebanks. In Language Technology for Cultural Heritage, pages 79–98, Berlin, Heidelberg. Springer Berlin Heidelberg.
  3. A New Latin treebank for Universal Dependencies: Charters between Ancient Latin and Romance Languages. In Proceedings of the Twelfth Language Resources and Evaluation Conference, pages 933–942, Marseille, France. European Language Resources Association.
  4. UDante: First Steps Towards the Universal Dependencies Treebank of Dante’s Latin Works. In Proceedings of the Seventh Italian Conference on Computational Linguistics, pages 1–7. Italian Association for Computational Linguistics (AILC).
  5. Yoeng-Jin Chu and Tseng-Hong Liu. 1965. On the shortest arborescence of a directed graph. Science Sinica, 14(10):1396–1400.
  6. Rethinking Embedding Coupling in Pre-trained Language Models. In International Conference on Learning Representations.
  7. Unsupervised Cross-lingual Representation Learning at Scale. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pages 8440–8451, Online. Association for Computational Linguistics.
  8. Timothy Dozat and Christopher D. Manning. 2017. Deep Biaffine Attention for Neural Dependency Parsing. In 5th International Conference on Learning Representations, pages 1–8.
  9. Jack Edmonds. 1967. Optimum Branchings. Journal of Research of the National Bureau of Standards, B, 71(4):233–240.
  10. Margherita Fantoli and Miryam de Lhoneux. 2022. Linguistic Annotation of Neo-Latin Mathematical Texts: A Pilot-Study to Improve the Automatic Parsing of the Archimedes Latinus". In Proceedings of the Second Workshop on Language Technologies for Historical and Ancient Languages, pages 129–134, Marseille, France. European Language Resources Association.
  11. Federica Gamba and Daniel Zeman. 2023a. Latin Morphology through the Centuries: Ensuring Consistency for Better Language Processing. In Proceedings of the Ancient Language Processing Workshop, pages 59–67, Varna, Bulgaria. INCOMA Ltd., Shoumen, Bulgaria.
  12. Federica Gamba and Daniel Zeman. 2023b. Universalising Latin Universal Dependencies: a harmonisation of Latin treebanks in UD. In Proceedings of the Sixth Workshop on Universal Dependencies (UDW, GURT/SyntaxFest 2023), pages 7–16, Washington, D.C. Association for Computational Linguistics.
  13. Learning to Forget: Continual Prediction with LSTM. Neural computation, 12(10):2451–2471.
  14. Dag Trygve Truslew Haug and Marius Jøhndal. 2008. Creating a Parallel Treebank of the Old Indo-European Bible Translations. In Proceedings of the Second Workshop on Language Technology for Cultural Heritage Data (LaTeCH 2008), pages 27–34.
  15. Sepp Hochreiter and Jürgen Schmidhuber. 1997. Long Short-Term Memory. Neural Computation, 9(8):1735–1780.
  16. Antonia Karamolegkou and Sara Stymne. 2021. Investigation of Transfer Languages for Parsing Latin: Italic Branch vs. Hellenic Branch. In Proceedings of the 23rd Nordic Conference on Computational Linguistics (NoDaLiDa), pages 315–320, Reykjavik, Iceland (Online). Linköping University Electronic Press, Sweden.
  17. Diederik P. Kingma and Jimmy Ba. 2015. Adam: A Method for Stochastic Optimization. In 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings.
  18. Sebastian Nehrdich and Oliver Hellwig. 2022. Accurate Dependency Parsing and Tagging of Latin. In Proceedings of the Second Workshop on Language Technologies for Historical and Ancient Languages, pages 20–25, Marseille, France. European Language Resources Association.
  19. Universal Dependencies v2: An Evergrowing Multilingual Treebank Collection. In Proceedings of the Twelfth Language Resources and Evaluation Conference, pages 4034–4043, Marseille, France. European Language Resources Association.
  20. Marco Passarotti. 2019. The Project of the Index Thomisticus Treebank. Digital Classical Philology, 10:299–320.
  21. Frederick Riemenschneider and Anette Frank. 2023. Exploring Large Language Models for Classical Philology. In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 15181–15199, Toronto, Canada. Association for Computational Linguistics.
  22. Overview of the EvaLatin 2024 Evaluation Campaign. In Proceedings of the Third Workshop on Language Technologies for Historical and Ancient Languages LT4HALA 2024, Torino, Italy. European Language Resources Association.
  23. Overview of the EvaLatin 2022 Evaluation Campaign. In Proceedings of the Second Workshop on Language Technologies for Historical and Ancient Languages, pages 183–188, Marseille, France. European Language Resources Association.
  24. Overview of the EvaLatin 2020 evaluation campaign. In Proceedings of LT4HALA 2020 - 1st Workshop on Language Technologies for Historical and Ancient Languages, pages 105–110, Marseille, France. European Language Resources Association (ELRA).
  25. Milan Straka. 2018. UDPipe 2.0 Prototype at CoNLL 2018 UD Shared Task. In Proceedings of the CoNLL 2018 Shared Task: Multilingual Parsing from Raw Text to Universal Dependencies, pages 197–207, Brussels, Belgium. Association for Computational Linguistics.
  26. Milan Straka and Jana Straková. 2017. Tokenizing, POS Tagging, Lemmatizing and Parsing UD 2.0 with UDPipe. In Proceedings of the CoNLL 2017 Shared Task: Multilingual Parsing from Raw Text to Universal Dependencies, pages 88–99, Vancouver, Canada. Association for Computational Linguistics.
  27. Milan Straka and Jana Straková. 2020. UDPipe at EvaLatin 2020: Contextualized Embeddings and Treebank Embeddings. In Proceedings of LT4HALA 2020 - 1st Workshop on Language Technologies for Historical and Ancient Languages, pages 124–129, Marseille, France. European Language Resources Association (ELRA).
  28. Massive Choice, Ample Tasks (MaChAmp): A Toolkit for Multi-task Learning in NLP. In Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: System Demonstrations, pages 176–197, Online. Association for Computational Linguistics.
  29. CoNLL 2018 Shared Task: Multilingual Parsing from Raw Text to Universal Dependencies. In Proceedings of the CoNLL 2018 Shared Task: Multilingual Parsing from Raw Text to Universal Dependencies, pages 1–21, Brussels, Belgium. Association for Computational Linguistics.
  30. CoNLL 2017 Shared Task: Multilingual Parsing from Raw Text to Universal Dependencies. In Proceedings of the CoNLL 2017 Shared Task: Multilingual Parsing from Raw Text to Universal Dependencies, pages 1–19, Vancouver, Canada. Association for Computational Linguistics.
  31. De Latinae Linguae Reparatione treebank. PID http://hdl.handle.net/11234/1-5438. LINDAT/CLARIAH-CZ digital library at the Institute of Formal and Applied Linguistics (ÚFAL), Faculty of Mathematics and Physics, Charles University.
  32. Universal Dependencies 2.13. PID http://hdl.handle.net/11234/1-5287. LINDAT/CLARIAH-CZ digital library at the Institute of Formal and Applied Linguistics (ÚFAL), Faculty of Mathematics and Physics, Charles University.
Citations (3)

Summary

We haven't generated a summary for this paper yet.

X Twitter Logo Streamline Icon: https://streamlinehq.com