Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
158 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
45 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Nostra Domina at EvaLatin 2024: Improving Latin Polarity Detection through Data Augmentation (2404.07792v1)

Published 11 Apr 2024 in cs.CL and cs.LG

Abstract: This paper describes submissions from the team Nostra Domina to the EvaLatin 2024 shared task of emotion polarity detection. Given the low-resource environment of Latin and the complexity of sentiment in rhetorical genres like poetry, we augmented the available data through automatic polarity annotation. We present two methods for doing so on the basis of the $k$-means algorithm, and we employ a variety of Latin LLMs in a neural architecture to better capture the underlying contextual sentiment representations. Our best approach achieved the second highest macro-averaged Macro-$F_1$ score on the shared task's test set.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (38)
  1. Ron Artstein and Massimo Poesio. 2008. Inter-coder agreement for computational linguistics. Computational Linguistics, 34(4):555–596.
  2. David Bamman and Patrick J. Burns. 2020. Latin BERT: A contextual language model for classical philology.
  3. James Bergstra and Yoshua Bengio. 2012. Random search for hyper-parameter optimization. Journal of Machine Learning Research, 13:281–305.
  4. The best of both worlds: Combining recent advances in neural machine translation. In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 76–86, Melbourne, Australia. Association for Computational Linguistics.
  5. Canine: Pre-training an efficient tokenization-free encoder for language representation. Transactions of the Association for Computational Linguistics, 10:73–91.
  6. Jacob Cohen. 1960. A coefficient of agreement for nominal scales. Educational and Psychological Measurement, 20(1):37–46.
  7. BERT: Pre-training of deep bidirectional transformers for language understanding. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), pages 4171–4186, Minneapolis, Minnesota. Association for Computational Linguistics.
  8. Alex Graves and Juergen Schmidhuber. 2005. Framewise phoneme classification with bidirectional LSTM networks. In Proceedings. 2005 IEEE International Joint Conference on Neural Networks, 2005., volume 4, pages 2047–2052 vol. 4.
  9. The Classical Language Toolkit: An NLP framework for pre-modern languages. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing: System Demonstrations, pages 20–29, Online. Association for Computational Linguistics.
  10. Diederik P. Kingma and Jimmy Ba. 2015. Adam: A method for stochastic optimization. In 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings.
  11. Toan Q. Nguyen and Julian Salazar. 2019. Transformers without tears: Improving the normalization of self-attention. In Proceedings of the 16th International Conference on Spoken Language Translation, Hong Kong. Association for Computational Linguistics.
  12. On the difficulty of training recurrent neural networks. In Proceedings of the 30th International Conference on Machine Learning, volume 28 of Proceedings of Machine Learning Research, pages 1310–1318, Atlanta, Georgia, USA. PMLR.
  13. PyTorch: An imperative style, high-performance deep learning library. In Advances in Neural Information Processing Systems 32, pages 8024–8035. Curran Associates, Inc.
  14. Nils Reimers and Iryna Gurevych. 2019. Sentence-BERT: Sentence embeddings using siamese BERT-networks. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), pages 3982–3992, Hong Kong, China. Association for Computational Linguistics.
  15. Frederick Riemenschneider and Anette Frank. 2023a. Exploring large language models for classical philology. In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 15181–15199, Toronto, Canada. Association for Computational Linguistics.
  16. Frederick Riemenschneider and Anette Frank. 2023b. Graecia capta ferum victorem cepit. Detecting Latin allusions to Ancient Greek literature.
  17. James A. Russell. 1980. A circumplex model of affect. Journal of Personality and Social Psychology, 39(6):1161–1178.
  18. James A. Russell and Albert Mehrabian. 1977. Evidence for a three-factor theory of emotions. Journal of Research in Personality, 11(3):273–294.
  19. Overview of the EvaLatin 2024 evaluation campaign. In Proceedings of the Third Workshop on Language Technologies for Historical and Ancient Languages – LT4HALA 2024, Torino, Italy. European Language Resources Association.
  20. Polarity and intensity: The two aspects of sentiment analysis. In Proceedings of Grand Challenge and Workshop on Human Multimodal Language (Challenge-HML), pages 40–47, Melbourne, Australia. Association for Computational Linguistics.
  21. Attention is all you need. In Advances in Neural Information Processing Systems, volume 30. Curran Associates, Inc.
  22. SciPy 1.0: Fundamental algorithms for scientific computing in Python. Nature Methods, 17:261–272.
  23. David Bamman and Gregory Crane. 2011. The Ancient Greek and Latin dependency treebanks. In Language Technology for Cultural Heritage, Theory and Applications of Natural Language Processing, pages 79–98. Springer Berlin Heidelberg, Berlin, Heidelberg.
  24. Biographies, Bollywood, boom-boxes and blenders: Domain adaptation for sentiment classification. In Proceedings of the 45th Annual Meeting of the Association of Computational Linguistics, pages 440–447, Prague, Czech Republic. Association for Computational Linguistics.
  25. A new Latin treebank for Universal Dependencies: Charters between Ancient Latin and Romance languages. In Proceedings of the 12th Language Resources and Evaluation Conference, pages 933–942, Marseille, France. European Language Resources Association.
  26. UDante: First steps towards the Universal Dependencies treebank of Dante’s Latin works. In Proceedings of the Seventh Italian Conference on Computational Linguistics, CLiC-it 2020, Bologna, Italy, March 1-3, 2021, volume 2769 of CEUR Workshop Proceedings. CEUR-WS.org.
  27. Yanqing Chen and Steven Skiena. 2014. Building sentiment lexicons for all major languages. In Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers), pages 383–389, Baltimore, Maryland. Association for Computational Linguistics.
  28. Universal Dependencies. Computational Linguistics, 47(2):255–308.
  29. Margherita Fantoli and Miryam de Lhoneux. 2022. Linguistic annotation of neo-Latin mathematical texts: A pilot-study to improve the automatic parsing of the Archimedes Latinus. In Proceedings of the Second Workshop on Language Technologies for Historical and Ancient Languages, pages 129–134, Marseille, France. European Language Resources Association.
  30. Computational and linguistic issues in designing a syntactically annotated parallel corpus of Indo-European languages. In Traitement Automatique Des Langues, Volume 50, Numéro 2: Langues Anciennes [Ancient Languages], pages 17–45, France. ATALA (Association pour le Traitement Automatique des Langues).
  31. Learning word vectors for sentiment analysis. In Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, pages 142–150, Portland, Oregon, USA. Association for Computational Linguistics.
  32. Caitlin A. Marley. 2018. Sentiments, Networks, Literary Biography: Towards a Mesoanalysis of Cicero’s Corpus. Ph.D. thesis, University of Iowa.
  33. Marco Passarotti. 2019. The Project of the Index Thomisticus Treebank. In Digital Classical Philology, volume 10, pages 299–320. De Gruyter, Berlin, Boston.
  34. The Perseus project: A digital library for the humanities. Literary and linguistic computing, 15(1):15–25.
  35. Recursive deep models for semantic compositionality over a sentiment treebank. In Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, pages 1631–1642, Seattle, Washington, USA. Association for Computational Linguistics.
  36. The sentiment of Latin poetry. Annotation and automatic analysis of the Odes of Horace. IJCoL. Italian Journal of Computational Linguistics, 9(1):53–71.
  37. Overview of the EvaLatin 2022 evaluation campaign. In Proceedings of the Second Workshop on Language Technologies for Historical and Ancient Languages, pages 183–188, Marseille, France. European Language Resources Association.
  38. Odi et amo. Creating, evaluating and extending sentiment lexicons for Latin. In Proceedings of The 12th Language Resources and Evaluation Conference, pages 3078–3086, Marseille, France. European Language Resources Association.
Citations (1)

Summary

We haven't generated a summary for this paper yet.