2000 character limit reached
Dependency Annotation of Ottoman Turkish with Multilingual BERT (2402.14743v2)
Published 22 Feb 2024 in cs.CL
Abstract: This study introduces a pretrained LLM-based annotation methodology for the first de dency treebank in Ottoman Turkish. Our experimental results show that, iteratively, i) pseudo-annotating data using a multilingual BERT-based parsing model, ii) manually correcting the pseudo-annotations, and iii) fine-tuning the parsing model with the corrected annotations, we speed up and simplify the challenging dependency annotation process. The resulting treebank, that will be a part of the Universal Dependencies (UD) project, will facilitate automated analysis of Ottoman Turkish documents, unlocking the linguistic richness embedded in this historical heritage.
- David Bamman and Gregory Crane. 2011. The ancient Greek and Latin dependency treebanks. In Language Technology for Cultural Heritage: Selected Papers from the LaTeCH Workshop Series, pages 79–98. Springer.
- Iterative pseudo-labeling with deep feature annotation and confidence-based sampling. In 2021 34th SIBGRAPI Conference on Graphics, Patterns and Images (SIBGRAPI), pages 192–198. IEEE.
- BERT: Pre-training of deep bidirectional transformers for language understanding. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), pages 4171–4186, Minneapolis, Minnesota. Association for Computational Linguistics.
- Applying Occam’s razor to transformer-based dependency parsing: What works, what doesn’t, and what is really necessary. In Proceedings of the 17th International Conference on Parsing Technologies and the IWPT 2021 Shared Task on Parsing into Enhanced Universal Dependencies (IWPT 2021), pages 131–144, Online. Association for Computational Linguistics.
- Integrating manual and automatic annotation for the creation of discourse network data sets. Politics and Governance, 8(2):326–339.
- Generate, annotate, and learn: NLP with synthetic text. Transactions of the Association for Computational Linguistics, 10:826–842.
- Aspect-based sentiment analysis using BERT. In Proceedings of the 22nd Nordic Conference on Computational Linguistics, pages 187–196.
- John SY Lee and Yin Hei Kong. 2012. A dependency treebank of classical Chinese poems. In Proceedings of the 2012 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pages 191–199.
- Daniele Moore and Kazuhito Uni. 2015. L’emprunt linguistique comme pont d’apprentissage. quelques réflexions à partir de l’étude des emprunts au français, à l’arabe et au persan dans les langues turques [language loan as a bridge for learning: Reflections from a study on French, Arabic and Persian loanwords in Turkic languages]. Revue japonaise de didactique du français, 10:197–213.
- Universal dependencies v1: A multilingual treebank collection. In Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC’16), pages 1659–1666.
- J.W. Redhouse. 1884. A Turkish and English Lexicon: Shewing in English the Significations of the Turkish Terms. Number 1. böl. in A Turkish and English Lexicon: Shewing in English the Significations of the Turkish Terms. American mission.
- Daniel Swanson and Francis Tyers. 2022. A universal dependencies treebank of ancient Hebrew. In Proceedings of the Thirteenth Language Resources and Evaluation Conference (LREC’22), pages 2353–2361.
- TDK. 2005. Türkçe sözlük. Number 1. c. in Sözlük Bilim ve Uygulama Kolu yayınları. Atatürk Kültür, Dil ve Tarih Yüksek Kurumu, Türk Dil Kurumu.
- Performance of multiple pretrained BERT models to automate and accelerate data annotation for large datasets. Radiology: Artificial Intelligence, 4(4):e220007.
- Resources for Turkish dependency parsing: Introducing the BOUN treebank and the BoAT annotation tool. Language Resources and Evaluation, pages 1–49.
- Towards making the most of BERT in neural machine translation. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 34, pages 9378–9385.
- Amir Zeldes and Mitchell Abrams. 2018. The Coptic universal dependency treebank. In Proceedings of the second workshop on universal dependencies (UDW 2018), pages 192–201.
- Şaziye Betül Özateş (5 papers)
- Tarık Emre Tıraş (2 papers)
- Efe Eren Genç (2 papers)
- Esma Fatıma Bilgin Taşdemir (1 paper)