The DURel Annotation Tool: Human and Computational Measurement of Semantic Proximity, Sense Clusters and Semantic Change (2311.12664v2)
Abstract: We present the DURel tool that implements the annotation of semantic proximity between uses of words into an online, open source interface. The tool supports standardized human annotation as well as computational annotation, building on recent advances with Word-in-Context models. Annotator judgments are clustered with automatic graph clustering techniques and visualized for analysis. This allows to measure word senses with simple and intuitive micro-task judgments between use pairs, requiring minimal preparation efforts. The tool offers additional functionalities to compare the agreement between annotators to guarantee the inter-subjectivity of the obtained judgments and to calculate summary statistics giving insights into sense frequency distributions, semantic variation or changes of senses over time.
- Rudsi: graph-based word sense induction dataset for russian.
- Deepmistake: Which senses are hard to distinguish for a word-in-context model. volume 2021-June, pages 16–30.
- CoSimLex: A resource for evaluating graded word similarity in context. In Proceedings of the Twelfth Language Resources and Evaluation Conference, pages 5878–5886, Marseille, France. European Language Resources Association.
- DiaWUG: A Dataset for Diatopic Lexical Semantic Variation in Spanish. In Proceedings of the 13th Language Resources and Evaluation Conference, Marseille, France. European Language Resources Association.
- Correlation clustering. Machine Learning, 56(1-3):89–113.
- Andreas Blank. 1997. Prinzipien des lexikalischen Bedeutungswandels am Beispiel der romanischen Sprachen. Niemeyer, Tübingen.
- Korp – the corpus infrastructure of språkbanken. In Proceedings of LREC 2012. Istanbul: ELRA, volume Accepted, page 474–478.
- Susan Windisch Brown. 2008. Choosing sense distinctions for WSD: Psycholinguistic evidence. In Proceedings of the 46th Annual Meeting of the Association for Computational Linguistics on Human Language Technologies: Short Papers, pages 249–252, Stroudsburg, PA, USA.
- Xl-lexeme: Wic pretrained model for cross-lingual lexical semantic change. In Proceedings of the 61th Annual Meeting of the Association for Computational Linguistics, Online. Association for Computational Linguistics.
- ChiWUG: A graph-based evaluation dataset for chinese lexical semantic change detection. In Proceedings of the 4th International Workshop on Computational Approaches to Historical Language Change, Singapore. Association for Computational Linguistics.
- Unsupervised cross-lingual representation learning at scale. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pages 8440–8451, Online. Association for Computational Linguistics.
- Mark Davies. 2012. Expanding Horizons in Historical Linguistics with the 400-Million Word Corpus of Historical American English. Corpora, 7(2):121–157.
- BERT: Pre-training of deep bidirectional transformers for language understanding. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), pages 4171–4186, Minneapolis, Minnesota. Association for Computational Linguistics.
- Measuring word meaning in context. Computational Linguistics, 39(3):511–554.
- Analysing lexical semantic change with contextualised word representations. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pages 3960–3973, Online. Association for Computational Linguistics.
- Catma.
- SURel: A gold standard for incorporating meaning shifts into term extraction. In Proceedings of the 8th Joint Conference on Lexical and Computational Semantics, pages 1–8, Minneapolis, MN, USA.
- Deberta: Decoding-enhanced bert with disentangled attention. arXiv preprint arXiv:2006.03654.
- Lawrence Hubert and Phipps Arabie. 1985. Comparing partitions. Journal of Classification, 2:193–218.
- Adam Kilgarriff. 1997. "I don’t believe in word senses". Computers and the Humanities, 31(2).
- Adam Kilgarriff. 2007. Word Senses, chapter 2. Springer.
- The inception platform: Machine-assisted and knowledge-oriented interactive annotation. In Proceedings of the 27th International Conference on Computational Linguistics: System Demonstrations, pages 5–9. Association for Computational Linguistics. Event Title: The 27th International Conference on Computational Linguistics (COLING 2018).
- Andrey Kutuzov and Lidia Pivovarova. 2021a. Rushifteval: a shared task on semantic shift detection for russian. Komp’yuternaya Lingvistika i Intellektual’nye Tekhnologii: Dialog conference.
- Andrey Kutuzov and Lidia Pivovarova. 2021b. Three-part diachronic semantic change dataset for russian.
- NorDiaChange: Diachronic semantic change dataset for Norwegian. In Proceedings of the Thirteenth Language Resources and Evaluation Conference, pages 2563–2572, Marseille, France. European Language Resources Association.
- Word sense clustering and clusterability. Computational Linguistics, 42(2):245–275.
- Potato: The portable text annotation tool. In Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing: System Demonstrations.
- Deep contextualized word representations. In Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pages 2227–2237, New Orleans, LA, USA.
- Mohammad Taher Pilehvar and Jose Camacho-Collados. 2019. WiC: the word-in-context dataset for evaluating context-sensitive meaning representations. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), pages 1267–1273, Minneapolis, Minnesota. Association for Computational Linguistics.
- Exploring the limits of transfer learning with a unified text-to-text transformer. J. Mach. Learn. Res., 21(140):1–67.
- Dominik Schlechtweg. 2023. Human and Computational Measurement of Lexical Semantic Change. Ph.D. thesis, University of Stuttgart, Stuttgart, Germany.
- SemEval-2020 Task 1: Unsupervised Lexical Semantic Change Detection. In Proceedings of the 14th International Workshop on Semantic Evaluation, Barcelona, Spain. Association for Computational Linguistics.
- Diachronic Usage Relatedness (DURel): A framework for the annotation of lexical semantic change. In Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pages 169–174, New Orleans, Louisiana.
- DWUG: A large Resource of Diachronic Word Usage Graphs in Four Languages. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, pages 7079–7091, Online and Punta Cana, Dominican Republic. Association for Computational Linguistics.
- A. Soares da Silva. 1992. Homonímia e polissemia: Análise sémica e teoria do campoléxico. In Actas do XIX Congreso Internacional de Lingüística e Filoloxía Románicas, volume 2 of Lexicoloxía e Metalexicografía, pages 257–287, La Coruña. Fundación Pedro Barrié de la Maza.
- Lukas Theuer Linke. 2023. Testing the effect of using crowdsourced semantic proximity judgments in the process of human lexicographical word sense clustering. Bachelor thesis, University of Stuttgart.
- LSCDiscovery: A shared task on semantic change discovery and detection in Spanish. In Proceedings of the 3rd International Workshop on Computational Approaches to Historical Language Change, Dublin, Ireland. Association for Computational Linguistics.
- Tuo Zhang. 2023. An ordinal formulation of the graded word-in-context task. Master thesis, University of Stuttgart.
Sponsor
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.
Top Community Prompts
Collections
Sign up for free to add this paper to one or more collections.